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Abstract 

We study the distribution of hard-, soft-, and adaptive soft-thresholding estimators 
within a linear regression model where the number of parameters k can depend on sam- 
ple size n and may diverge with n. In addition to the case of known error-variance, we 
define and study versions of the estimators when the error-variance is unknown. Wc derive 
the finite-sample distribution of each estimator and study its behavior in the large-sample 
limit, also investigating the effects of having to estimate the variance when the degrees of 
freedom n — k does not tend to infinity or tends to infinity very slowly. Our analysis encom- 
passes both the case where the estimators are tuned to perform consistent variable selection 
and the case where the estimators are tuned to perform conservative variable selection. Fur- 
thermore, we discuss consistency, uniform consistency and derive the uniform convergence 
rate under either type of tuning. 

MSG subject classification: 62F11, 62F12, 62J05, 62J07, 62E15, 62E20 
Keywords and phrases: Thresholding, Lasso, adaptive Lasso, penalized maximum like- 
lihood, variable selection, finite-sample distribution, asymptotic distribution, variance esti- 
mation, uniform convergence rate, high-dimensional model, oracle property 

1 Introduction 

We study the distribution of thresholding estimators such as hard-thresholding, soft-thresholding, 
and adaptive soft-thresholding in a linear regression model when the number of regressors can 
be large. These estimators can be viewed as penalized least-squares estimators in the case 
of an orthogonal design matrix, with soft-thresholding then coinciding with the Lasso (intro- 
duced by Frank and Friedman (1993), AUiney and Ruzinsky (1994), and Tibshirani (1996)) and 
with adaptive soft-thresholding coinciding with the adaptive Lasso (introduced by Zou (2006)). 
Thresholding estimators have of course been discussed earlier in the context of model selec- 
tion (see Bauer, Potscher and Hackl (1988)) and in the context of wavelets (see, e.g., Donoho, 
Johnstone, Kerkyacharian, Picard (1995)). Contributions concerning distributional properties of 
thresholding and penalized least-squares estimators are as follows: Knight and Fu (2000) study 
the asymptotic distribution of the Lasso estimator when it is tuned to act as a conservative vari- 
able selection procedure, whereas Zou (2006) studies the asymptotic distribution of the Lasso 
and the adaptive Lasso estimators when they are tuned to act as consistent variable selection 
procedures. Fan and Li (2001) and Fan and Peng (2004) study the asymptotic distribution of 
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the so-called smoothly clipped absolute deviation (SCAD) estimator when it is tuned to act as 
a consistent variable selection procedure. In the wake of Fan and Li (2001) and Fan and Peng 
(2004) a large number of papers have been published that derive the asymptotic distribution of 
various penalized maximum likelihood estimators under consistent tuning; see the introduction 
in Potscher and Schneider (2009) for a partial list. Except for Knight and Fu (2000), all these 
papers derive the asymptotic distribution in a fixed-parameter framework. As pointed out in 
Leeb and Potscher (2005), such a fixed-parameter framework is often highly misleading in the 
context of variable selection procedures and penalized maximum likelihood estimators. For that 
reason, Potscher and Lccb (2009) and Potscher and Schneider (2009) have conducted a detailed 
study of the finite-sample as well as large-sample distribution of various penalized least-squares 
estimators, adopting a moving-parameter framework for the asymptotic results. [Related results 
for so-called post-modcl-sclcction estimators can be found in Leeb and Potscher (2003, 2005) 
and for model averaging estimators in Potscher (2006); see also Sen (1979) and Potscher (1991).] 
The papers by Potscher and Leeb (2009) and Potscher and Schneider (2009) are set in the frame- 
work of an orthogonal linear regression model with a fixed number of parameters and with the 
error-variance being known. 

In the present paper we build on the just mentioned papers Potscher and Leeb (2009) and 
Potscher and Schneider (2009). In contrast to these papers, we do not assume the number of 
regressors k to be fixed, but let it depend on sample size - thus allowing for high-dimensional 
models. We also consider the case where the error-variance is unknown, which in case of a high- 
dimensional model creates non-trivial complications as then estimators for the error-variance will 
typically not be consistent. Considering thresholding estimators from the outset in the present 
paper allows us also to cover non-orthogonal design. While the asymptotic distributional results 
in the known-variance case do not differ in substance from the results in Potscher and Leeb 
(2009) and Potscher and Schneider (2009), not unexpectedly we observe different asymptotic 
behavior in the unknown- variance case if the number of degrees of freedom n — fc is constant, the 
difference resulting from the non- vanishing variability of the error- variance estimator in the limit. 
Less expected is the result that - under consistent tuning - for the variable selection probabilities 
(implied by all the estimators considered) as well as for the distribution of the hard-thresholding 
estimator, estimation of the error-variance still has an effect asymptotically even if n—k diverges, 
but does so only slowly. 

To give some idea of the theoretical results obtained in the paper we next present a rough 
summary of some of these results. For simplicity of exposition assume for the moment that the 
n X /c design matrix X is such that the diagonal elements of {X' X/n)^^ are equal to 1, and that 
the error- variance cr^ is equal to 1. Let 6H,i denote the hard-thresholding estimator for the i-th 
component 9i of the regression parameter, the threshold being given by ar]^ ^, with ct^ denoting 
the usual error-variance estimator and with rj^ „ denoting a tuning parameter. An infeasible 

version of the estimator, denoted by OH,i, which uses a instead of a, is also considered (known- 
variance case). We then show that the uniform rate of convergence of the hard-thresholding 
estimator is n~^/^ if the threshold satisfies r/j „ — >• and n^^'^rj^ „ — > < oo ("conservative tun- 
ing"), but that the uniform rate is only ry^^ if the threshold satisfies r?^ „ and n}^^'f]i,n ^ °° 
("consistent tuning"). The same result also holds for the soft-thresholding estimator ds,i and the 
adaptive soft-thresholding estimator Oasa : a-s well as for infeasible variants of the estimators that 
use knowledge of a (known- variance case). Furthermore, all possible limits of the centered and 
scaled distribution of the hard-thresholding estimator 6H,i (as well as of the soft- and the adaptive 
soft-thresholding estimators ^5.^ and OAs.i) under a moving parameter framework arc obtained. 
Consider first the case of conservative tuning: then all possible limiting forms of the distribution 
of n^/^ {^H,i — as well as of n^/^ {^H,i — Oi,n) for arbitrary parameter sequences Oi^n are 
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determined. It turns out that - in the known-variance case - these Umits are of the same func- 
tional form as the finite-sample distribution, i.e., they are a convex combination of a pointmass 
and an absolutely continuous distribution that is an excised version of a normal distribution. In 
the unknown- variance case, when the number of degrees of freedom n — k goes to infinity, exactly 
the same limits arise. However, if rt — fc is constant, the limits are "averaged" versions of the 
limits in the known-variance case, the averaging being with respect to the distribution of the 
variance estimator a^. Again these limits have the same functional form as the corresponding 
finite-sample distributions. Consider next the case of consistent tuning: Here the possible limits 
of V^n i^H.i ~ (^i.iij as well as of t^,^^ {^ha ~ ^i.n^ have to be considered, as 77, „ is the uniform 
convergence rate. In the known- variance case the limits are convex combinations of (at most) 
two pointmasses, the location of the pointmasses as well as the weights depending on Oi^n and 
rji „. In the unknown- variance case exactly the same limits arise ii n — k diverges to infinity suf- 
ficiently fast; however, iin — k is constant or diverges to infinity sufhciently slowly, the limits are 
again convex combinations of the same pointmasses, but with weights that are typically different. 
The picture for soft-thresholding and adaptive soft-thresholding is somewhat different: in the 
known- variance case, as well as in the unknown- variance case when n — k diverges to infinity, the 
limits are (single) pointmasses. However, in the unknown-variance case and if n — fc is constant, 
the limit distribution can have an absolutely continuous component. It is furthermore useful to 
point out that in case of consistent tuning the sequence of distributions of n^/^ {^ha ^ (^i,nj is 
not stochastically bounded in general (since ry^ „ is the uniform convergence rate) , and the same 
is true for soft-thresholding 9s,i and adaptive soft-thresholding Oasa- This throws a light on the 
fragility of the oracle-property, see Section 6.4 for more discussion. 

While our theoretical results for the thresholding estimators immediately apply to Lasso and 
adaptive Lasso in case of orthogonal design, this is not so in the non-orthogonal case. In order 
to get some insight into the finite-sample distribution of the latter estimators also in the non- 
orthogonal case, we numerically compare the distribution of Lasso and adaptive Lasso with their 
thresholding counterparts in a simulation study. 



The main take-away messages of the paper can be summarized as follows: 

• The finite-sample distributions of the various thresholding estimators considered are highly 
non-normal, the distributions being in each case a convex combination of pointmass and 
an absolutely continuous (non- normal) component. 

• The non-normality persists asymptotically in a moving parameter framework. 

• Results in the unknown-variance case are obtained from the corresponding results in the 
known-variance case by smoothing with respect to the distribution of a. In line with this, 
one would expect the limiting behavior in the unknown-variance case to coincide with the 
limiting behavior in the known-variance whenever the degrees of freedom n ~ k diverge to 
infinity. This indeed turns out to be so for some of the results, but not for others where 
we see that the speed of divergence oi n — k matters. 

• In case of conservative tuning the estimators have the expected uniform convergence rate, 
which is n~^/^ under the simplified assumptions of the above discussion, whereas under 
consistent tuning the uniform rate is slower, namely rj^ „ under the simplified assumptions 
of the above discussion. This is intimately connected with the fact that the so-called 'oracle 
property' paints a misleading picture of the performance of the estimators. 
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• The numerical study suggests that the resuhs for the thresholding estimators 9s,i and 9As,i 
qualitatively apply also to the (components of) the Lasso and the adaptive Lasso as long 
as the design matrix is not too ill-conditioned. 



The paper is organized as follows. We introduce the model and define the estimators in Section 
[2] Section [3] treats the variable selection probabilities implied by the estimators. Consistency, 
uniform consistency, and uniform convergence rates are discussed in Section |4] We derive the 
finite-sample distribution of each estimator in Section [5] and study the large-sample behavior of 
these in Section [6j A numerical study of the finite-sample distribution of Lasso and adaptive 
Lasso can be found in Section [7] All proofs are relegated to Section [H] 



2 The Model and the Estimators 

Consider the linear regression model 

Y = xe + u 

with F an n X 1 vector, X a nonstochastic n x k matrix of rank k > 1, and u ~ N(0,a'^ln), 
< cr < 00. We allow k, the number of columns of X, as well as the entries of Y, X, and u to 
depend on sample size n (in fact, also the probability spaces supporting Y and u may depend on 
n), although we shall almost always suppress this dependence on n in the notation. Note that 
this framework allows for high-dimensional regression models, where the number of regressors k 
is large compared to sample size n, as well as for the more classical situation where k is much 
smaller than n. Furthermore, let „ denote the nonnegative square root of {{X'X/n)~^)ii, the 
i-th. diagonal element of {X'X/n)~^ . Now let 

hs = (x'xy'x'Y 

= (n - k)-\Y - XhsYiY - Xhs) 

denote the least-squares estimator for and the associated estimator for cr^, the latter being 
defined only if n > k. The hard-thresholding estimator Ofj is defined via its components as 
follows 



'LS,i 



where the tuning parameters 77^ „ are positive real numbers and 9 Ls,i denotes the i-th component 
of the least-squares estimator. We shall also need to consider its infeasible counterpart 9h given 

by 

The soft-thresholding estimator ^5 and its infeasible counterpart 9s are given by 
and 







9LS,^ 





h,i = ^S,»('?»,„) = sign(^LS,,) ( 9ls,i - '^^i,nVt,n) 
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where (•)^ = niax(-,0). Finally, the adaptive soft-thresholding estimator 9 as and its infeasible 
counterpart 9 as are defined via 

OAS,i = ^AS,iiVi,n) = ^LS,i (l " ^^^lnVln/(^ls,i) ^ 

if 9LS,^ < ^^i,nVi,n 

0LS,i-O-'^^lnVln/^LS,i if ^LS^i > ^U,nVi,n 



and 



OAS,i = 0AS,i{Vi,n) = (^LS,i(j--(^^S,lnVln/0ls,i)^ 



if 

hs,i - <T'dnVln/hs,i if 



OLS,^ 

hs,i 



Note that 6h, Os, and 6as as well as their infeasible counterparts are equivariant under 

scaling of the cohimns of {Y : X) by non-zero column-specific scale factors. We have chosen to 
let the thresholds cr^^^-q^ j^ {(^^.inVirn respectively) depend explicitly on a (a, respectively) and 
„ in order to give ry^ „ an interpretation independent of the values of a and X. Furthermore, 
often rj^ „ will be chosen independently of i, i.e., 'rii n ^ Vn where ?7„ is a positive real number. 
Clearly, for the feasible versions we always need to assume n > k, whereas for the infeasible 
versions n> k suffices. 

We note the simple fact that 



< ~es,i < OAS,i < OH,i < eLS,i 
holds on the event that 9LS,i > 0, and that 

OLS,i < OH,i < ~9AS,i < Os,i < 



(1) 



(2) 



holds on the event that 6LS,i < 0. Analogous inequalities hold for the infeasible versions of the 
estimators. 

Remark 1 (Lasso) (i) Consider the objective function 



{Y - X9y{Y - X9) +2na^7j\^J9i 



i=l 



where r/'^ „ are positive real numbers. It is well-known that a unique minimizer 6l of this objective 
function exists, the Lasso-estimator. It is easy to see that in case X'X is diagonal we have 



'LS,i 



- I p2 



Hence, in the case of diagonal X'X^ the components 9^,% of the Lasso reduce to soft-thresholding 
estimators with appropriate thresholds; in particular, 9]^ i coincides with 9s i for the choice 77^ = 
nCi~n- Therefore all results derived below for soft-thresholding immediately give corresponding 
results for the Lasso as well as for the Dantzig-selector in the diagonal case. We shall abstain 
from spelling out further details. 
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(ii) Sometimes 77^ „ in the definition of tlie Lasso is cfiosen independently of i; more reasonable 
choices seem to be (a) rj'^ .^ = r7j_„i/'i^„ (where i/ij „ denotes the nonnegative square root of the 
i-th diagonal element of (X'X/n)), and (b) 77^ „ — rj^ j^S^^.n where r/j „ are positive real numbers 
(not depending on the design matrix and often not on i) as then rj^ ^ again has an interpretation 
independent of the values of a and X. Note that in case (a) or (b) the solution of the optimization 
problem is equivariant under scaling of the columns of {Y : X) by non-zero column-specific scale 
factors. 

(iii) Similar results obviously hold for the infeasible versions of the estimators. 
Remark 2 (Adaptive Lasso) Consider the objective function 

k 

{Y - xe)'{Y - xe) + 2n^2 Y^^vUf \0i\ I 

where ri^ „ are positive real numbers. This is the objective function of the adaptive Lasso (where 

often ri'i^ = 7]'^ is chosen independent of i). Again the minimizer 9al exists and is unique (at 

least on the event where 9Ls,i 7^ for all i). Clearly. Oal is equivariant binder scaling of the 
columns of {Y : X) by non-zero column-specific scale factors provided rj'^ „ does not depend on 
the design matrix. It is easy to see that in case X'X is diagonal we have 

Hence, in the case of diagonal X' X, the components Oala of the adaptive Lasso reduce to the 
adaptive soft-thresholding estimators 9AS,i (for "Hi n — 'Hi n)- Therefore all results derived below 
for adaptive soft-thresholding immediately give corresponding results for the adaptive Lasso in 
the diagonal case. We shall again abstain from spelling out further details. Similar results 
obvioiisly hold for the infeasible versions of the estimators. 

Remark 3 ( Other estimators) (i) The adaptive Lasso as defined in Zou (2006) has an additional 
tuning parameter 7. We consider adaptive soft-thresholding only for the case 7 = 1, since 
otherwise the estimator is not equivariant in the sense described above. Nonetheless an analysis 
for the case 7 7^ 1, similar to the analysis in this paper, is possible in principle. 

(ii) An analysis of a SCAD-based thresholding estimator is given in Potscher and Leeb (2009) 
in the known-variance case. [These results arc given in the orthogonal design case, but easily 
generalize to the non-orthogonal case.] The results obtained there for SCAD-based thresholding 
are similar in spirit to the results for the other thresholding estimators considered here. The 
unknown-variance case could also be analyzed in principle, but we refrain from doing so for the 
sake of brevity. 

(iii) Zhang (2010) introduced the so-called minimax concave penalty (MCP) to be used for 

penalized least-squares estimation. Apart from the usual tuning parameter, MCP also depends 
on a shape parameter 7. It turns out that the thresholding estimator based on MCP coincides 
with hard-thresholding in case 7 < 1, and thus is covered by the analysis of the present paper. 
In case 7 > 1, the MCP-based thresholding estimator could similarly be analyzed, especially 
since the functional form of the MCP-based thresholding estimator is relatively simple (namely, 
a piecewise linear function of the least-squares estimator). We do not provide such an analysis 
for brevity. 

For all asymptotic considerations in this paper we shall always assume without further men- 
tioning that £,in/n'= ((X'X)"^)ji satisfies 

sup^n/" < 00 (3) 

n 
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for every fixed i > 1 satisfying i < k{n) for large enough n. The case excluded by assumption 
(jsj) seems to be rather uninteresting as unboundedness of ^In/n means that the information 
contained in the regressors gets weaker with increasing sample size (at least along a subsequence); 
in particular, this implies (coordinate-wise) inconsistency of the least-squares estimator. [In fact, 
if k as well as the elements of X do not depend on n, this case is actually impossible as ^i^jn 
is then necessarily monotonically nonincreasing.] 

The following notation will be used in the paper: Let E denote the extended real line 
MU{— 00,00} endowed with the usual topology. On NU {00} we shall consider the topology 
it inherits from M. Furthermore, $ and denote the cumulative distribution function (cdf) and 
the probability density function (pdf ) of a standard normal distribution, respectively. By ^ we 
denote the cdf of a non-central T-distribution with m € N degrees of freedom and non-centrality 
parameter c S M. In the central case, i.e., c = 0, we simply write T^. We use the convention 
$(00) = 1, $(—00) = with a similar convention for T^.c- 

3 Variable Selection Probabilities 

The estimators d}i, 63, and Oj^g can be viewed as performing variable selection in the sense that 
these estimators set components of 9 exactly equal to zero with positive probability. In this 
section we study the variable selection probability Pn,e,a {^i 7^ 0^ , where Oi stands for any of 

the estimators 9H,ii Os,ii and Oas.i- Since these probabilities are the same for any of the three 
estimators considered we shall drop the subscripts iJ, 5', and AS in this section. We use the 
same convention also for the variable selection probabilities of the infeasible versions. 

3.1 Known- Variance Case 

Since Pn,e.a (^^j 7^ 0^ = 1 — Pn,e.<y {di = it suffices to study the variable deletion probability 
Pn,e,a {0, - 0) = $ (-0,/« J + r;,,„)) - $ („V2 (-e^/^a^^ J - j) . (4) 

As can be seen from the above formula, Pn,e.a [di — 0^ depends on 6 only via 9i. We 
first study the variable selection/deletion probabilities under a "fixed-parameter" asymptotic 
framework. 

Proposition 4 Let < a < 00 he given. For every i > 1 satisfying i < k = k{n) for large 
enough n we have: 

(a) A necessary and sufficient condition for Pn.e,^ (j^i ~ —> as n 00 for all 9 satisfying 
Oi ^ (6i not depending on n) is „?7j „ — > 0. 

(b) A necessary and sufficient condition for Pn.e^a (j^i = 0^ 1 as n 00 for all 6 satisfying 
9i = is n^^^rii ,^ 00. 

(c) A necessary and sufficient condition for Pn.e.a = 0^ —> q < 1 as n 00 for all 

9 satisfying 9i — is n^^'^rj^^ — > e^, < < 00. The constant c; is then given by Ci — 
$(ei)-$(-e,). 

Part (a) of the above proposition gives a necessary and sufficient condition for the procedure to 
correctly detect nonzero coefficients with probability converging to 1. Part (b) gives a necessary 
and sufficient condition for correctly detecting zero coefficients with probability converging to 1. 
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Remark 5 If Cin/'^^^^ does not converge to zero, the conditions on ry, „ in Parts (a) and (b) 
are incompatible; also the conditions in Parts (a) and (c) are then incompatible (except when 
Ci — 0). However, the case where „/n^/^ does not converge to zero is of little interest as the 

least-squares estimator Oj^sa is then not consistent. 



Remark 6 (Speed of convergence in Proposition^ (i) The speed of convergence in (a) is ^r]^ „ 
in case is bounded (an uninteresting case as noted above); if n^^^^Zn ^ the speed of 

convergence in (a) is not slower than exp (— cn^~^) / {n^^'^^in) fo'" some suitable c > depending 
on 9i/a. 

(ii) The speed of convergence in (b) is exp (— 0.5nryf_„) / (?i^^^?7i^„) • In (c) the speed of 
convergence is given by the rate at which n^^'^rj^ „ approaches e^. 

[For the above results we have made use of Lemma VII. 1.2 in Feller (1957).] 

Remark 7 For 6* G M'''") let An(0) = {i.l<i< k{n),e, ^ 0}. Then (i) for every i e A„{0) 

, , (e, = O) < P„,e,. U {^j = O} < ^ PnM,a {O, o) . 



Pr, 



Suppose now that the entries of 9 do not change with n (although the dimension of 9 may depend 
on n)[^ Then, given that card(A„(6')) is bounded (this being in particular the case if k{n) is 
bounded), the probability of incorrect non-detection of at least one nonzero coefRcient converges 
to if and only if Ci,n'7i,n — >■ as n — > oo for every i £ An{9)- [If card(^„(6')) is unbounded 
then this probability converges to 0, e.g., if Ci,n^i,n ~^ ^^'^ ''^^^'^Hn — >■ oo as n — >• oo for every 
i e An{e) and mii^A,,f^0) > and EjGA„(e) exp {-cn£.~^) / {n^''^£,ll) ^- as rn- oo for a 
suitable c that is determined by infjgyi^(g) \9i\ /a] 

(ii) For every i ^ An{9) we have 

Pnfi^a {9, =0) > PnM,a f] = o} = 1 - Pn,e,a \J {^j ^ o} 

j<fA„{e) 

Suppose again that the entries of 9 do not change with n. Then, given that caid(A'f^{9)) is 
bounded (this being in particular the case if k{n) is bounded), the probability of incorrectly 
classifying at least one zero parameter as a non-zero one converges to as n — cx) if and only if 
n^^'^Vi n ^ ^ foi' every i € An{9). [If card(A^(^?)) is unbounded then this probability converges 
to 0, e.g., if Ej^A„(9) exp (-0.5nry2„) / {n^/\ ,^) ^ as n ^ oo.] 

(iii) In case X'X is diagonal, the relevant probabilities Pn.0,a (^UisA (e) — as well as 
Pn,9,cr {r\i^A„{e) = o}) ^an be directly expressed in terms of products of Pnfi,cr (9i = or 

Pn.e.a (9i — 0^, and Proposition can then be applied. 
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Since the fixed-parameter asymptotic framework often gives a misleading impression of the 
actual behavior of a variable selection procedure (cf. Leeb and Potscher (2005), Potscher and 
Leeb (2009)) we turn to a "moving-parameter" framework next, i.e., we allow the elements of 



^More precisely, this means that d is made up of the initial k{n) elements of a fixed element of I 
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9 as well as a to depend on sample size n. In the proposition to follow (and all subsequent 
large-sample results) we shall concentrate only on the case where ^rj^ „ — )■ as n — > oo, since 
otherwise the estimators 9i are not even consistent for 9i as a consequence of Proposition |4j 



cf. also Theorem 16 below. Given the condition ^-q^ „ — > 0, we shall then distinguish between 
the case n^^'^r]^ „ — >■ e^, < < oo, and the case n^/'^r]^ „ — ^ oo, which in light of Proposition 4 
we shall call the case of " conservative tuning" and the case of " consistent tuning" , respectively 



Proposition 8 Suppose that for given i> 1 satisfying i < k — k{n) for large enough n we have 
Ci,n'?i,n and n^^^rj^^^ -> where < et < oo. 

(a) Assume Ci < oo. Suppose that the true parameters 9^"^^ = (^i,n, • ■ • , ^fe„.n) G I^*^" o,'>^d 
cr„ e (0,oo) satisfy n^^'^9i^n/ {an£,i^n) ^ Vi^R. Then 



lim P 



(9, = 0) $ {-V, + e,) - $ {-V, - e,) . 



(h) Assume e; = 00. Suppose that the true parameters ^^"-^ — (^i.n, • • ■ , ^fc„,n) G Hi' 
(T„ e (0,00) satis/?/ 9i^n/{<^n^i,nVi,n) Ci ^ T'^en 

1. ICil < 1 implies lim„^oo -P„,e("),cr„ (^i = 0^ = 1. 

2. ICjI > 1 «mp/ies lim„^oo ■P„,9("),<t„ = o) = 0- 

5. ICJ = 1 and ri^n n^^"^ {Vi.n ~ Ci^'i,n/(o-„Ci,„)) ^ /or some G M, imp??/ 



lim P„,e(")..„ =0 =$(rO. 

In a fixed-parameter asymptotic analysis, which in Proposition |8] corresponds to the case 

9i^n = 9i and cr„ = cr, the limit of the probabilities Pn,e,cr — is always in case 9i ^ 0, and 

is 1 in case 9i — Q and consistent tuning (it is $ (e^) — $ in case 0i = and conservative 

tuning); this does clearly not properly capture the finite-sample behavior of these probabilities. 
The moving-parameter asymptotic analysis underlying Proposition [8] better captures the finite- 
sample behavior and, e.g., allows for limits other than and 1 even in the case of consistent 
tuning. In particular. Proposition [s] shows that the convergence of the variable selection/deletion 
probabilities to their limits in a fixed-parameter asymptotic framework is not uniform in 0^, and 
this non-uniformity is local in the sense that it occurs in an arbitrarily small neighborhood of 
9 1 = (holding the value of > fixed) |^ Furthermore, the above proposition entails that 
under consistent tuning deviations from 9i = of larger order than under conservative tuning go 
unnoticed asymptotically with probability 1 by the variable selection procedure corresponding 
to 9i. For more discussion in a special case (which in its essence also applies here) see Potscher 
and Leeb (2009). 

Remark 9 ( Speed of convergence in Proposition (i) The speed of convergence in (a) is given 
by the slower of the rate at which f^^^^?7i „ approaches and n^^'^9i,n/{o'nS,i n) approaches Vi 
provided that \ui\ < 00; if — 00, the speed of convergence is not slower than 

exp {~cn9lJ{ale^.J) I n''^9.,J{a,,i,J 



^There is no loss of generality here in assuming convergence of n^^^rj^ ^ to a (finite or infinite) limit, in the 
sense that this convergence can, for any given sequence n^/'^rjj be achieved along suitable subsequences in light 
of compactness of the extended real line. 

■^More generally, the non-uniformity arises for 9i/ a in a, neighborhood of zero. 
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for any c < 1/2. 

(ii) The speed of convergence in (bl) is not slower than exp (— criTyf^) / {n^^^ijj^ ^) where c 
depends on The same is true in case (b2) provided \(^\ < oo; if ICJ = oo, the speed of 
convergence is not slower than exp (— cn0^„/(CT^^^^^)) / |'^^^^^i,n/(CT„^j „) | for every c < 1/2. In 
case (b3) the speed of convergence is not slower than the speed of convergence of 

max (ex.p {-cn-ql^) / (n^/^r^j , |r,,„ - r^]) 

for any c < 2 in case \ri\ < oo; in case |ri| = cx) it is not slower than 

max (^exp {-cmf^ „) / (n^/^ry^ ,exp (-0.5rf „) / |ri_„|) 

for any c < 2. 

The preceding remark corrects and clarifies the remarks at the end of Section 3 in Potscher 
and Leeb (2009) and Section 3.1 in Potscher and Schneider (2009). 



3.2 Unknown- Variance Case 



In the unknown- variance case the finite-sample variable selection/deletion probabilities can be 
obtained as follows: 



= 



P, 



'LS.i 



Pn 
Pn 



T 



(5) 



Here we have used (|4|), and independence of a and 9LS,i allowed us to replace a by sa in the 
relevant formulae, cf. Leeb and Potscher (2003, p. 110). In the above p„_fc denotes the density 
of (n — fc)^^/^ times the square root of a chi-square distributed random variable with n — k 
degrees of freedom. It will turn out to be convenient to set p„_fe(s) = for s < 0, making p^^j^ 
a bounded continuous function on M. 

We now have the following fixed-parameter asymptotic result for the variable selection/deletion 
probabilities in the unknown-variance case that perfectly parallels the corresponding result in 
the known- variance case, i.e., Proposition [4] 

Proposition 10 Let < a < oo be given. For every i > 1 satisfying i < k — k{n) for large 
enough n we have: 

as n ^ oo for all 9 satisfying 



(a) A necessary and sufficient condition for Pn^g,^ (^i = 0^ 
=/: (6i not depending on n) is „77j „ 0. 

(h) A necessary and sufficient condition for Pnfi.a {^i = 0^ 
= is n^^^rjj^ 



1 as n oo for all 9 satisfying 



oo. 
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(c) A necessary and sufficient condition for Pn,e.a y^i ~ j ~ — as n — )■ cxd for all 
9 satisfying 9i = and with Ci^n = T'n-fc (^i) — Tn-k {—^i) satisfying linisup„_j.g^ „ < 1 is 

^^^'^Vi.n Ci, < Ci < OO. 



Proposition [To] shows that the dichotomy regarding conservative tuning and consistent tuning 
is expressed by the same conditions in the unknown-variance case as in the known-variance 
case. Furthermore, note that Ci_„ appearing in Part (c) of the above proposition converges to 
Ci — $(ei) — $(— Ci) in the case where n — k ^ oo, the hmit thus being the same as in the 
known-variance case. This is different in case n — A; is constant equal to m, say, eventuahy, the 
sequence Ci_„ then being constant equal to T,„ (e^) — (— e^) eventually. We finally note that 
Remark [5] also applies to Proposition [T0| above. 

For the same reasons as in the known- variance case we next investigate the asymptotic behav- 
ior of the variable selection/deletion probabilities under a moving-parameter asymptotic frame- 
work. We consider the case where n — k is (eventually) constant and the case where n — k ^ oo. 
There is no essential loss in generality in considering these two cases only, since by compactness 
of N U {c»} we can always assume (possibly after passing to subsequences) that n — k converges 
in NU {oo}. 

Theorem 11 Suppose that for given i > 1 satisfying i < k = k{n) for large enough n we have 
^i,nVi,n ~^ and n^^^rj^^^ — )• where < Cj < oo. 

(a) Assume < oo. Suppose that the true parameters 9^"'' = (6*1^ ) e M''" and 

an e (0,oo) satisfy n-^/^6'i,n/(o'„^j „) -> z^i g E. 

(al) If n — k is eventually constant equal to m, say, then 

lim P„ ^ (9, =0) = [ ($ (-1^, + se,) - $ (-J/, - sci)) Pm{s)ds. 

(a2) If n ~ k ^ 00 holds, then 

lim P„ = 0) = $ (-1/, + e,) - $ {-ly, - e^) . 



n— ^00 ' 



(b) Assume e, = 00. Suppose that the true parameters = {9i.n, ■ ■ ■ ,Ok„.n) G IR*^" and 
an e (0,00) satisfy 9i^n/{an^i^nVi,n) Ci e ^■ 

(bl ) If n ~ k is eventually constant equal to m, say, then 



lim P„ 



«,e("),^„ 0) / P7n{s)ds = Pr(x^ > mCf). 



(b2) //n — fc — > 00 holds, then 

1. ICil < 1 implies lim„_^oo P„,e("),cr„ (^j = 0^ = 1. 

2. \C^\>l implies lim„^oo ^„,e("),a„ (^^ = = 0. 

3. ICil = 1 a^c' n^^'^'ni.nl ~ fc)^^^ -> imply 

lim P„,e("),,,. (0,; = O) =$(rO 
provided n^n := "^/^ {r]^ n - Cz^i,n/{<7n^i,n)) ^ ^» M some ri G 1 
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4- \Ci\ = 1 o-nd n^/'^rj^ „/ (n — k)^^^ — > with < < oo imply 



provided Vi^n fi for some S M. [Note that the integral in the above display reduces to 1 if 
ri = oo, and to if r.^ — —oo.] 

5. ICil = 1 o^iT-d n^^'^Vi.rJ ~ k)^^^ — > oo imply 

lim P„,,<.,,.„ (e,; = O) = <I>(r:) 

provided (^n^^^rj^^ „/ (n — fc)"'^^'^^ ri.„ — > 2^"'^/^r^ for some r[ € M. 



Theorem [TT] shows, in particular, that also in the unknown-variance case the convergence 
of the variable selection/deletion probabilities to their limits in a fixed-parameter asymptotic 
framework is not locally uniform in Oi. In the case of conservative tuning the theorem furthermore 
shows that the limit of the variable selection/deletion probabilities in the unknown- variance case 
is the same as in the known- variance case if the degrees of freedom n — fc go to infinity (entailing 
that the distribution oi& /a concentrates more and more around 1); if n— fc is eventually constant, 
the limit turns out to be a mixture of the known- variance case limits (with a replaced by scr), 
the mixture being with respect to the distribution of a /a. [We note that in the somewhat 
uninteresting case = this mixture also reduces to the same limit as in the known-variance 
case.] While this result is as one would expect, the situation is different and more subtle in the 
case of consistent tuning: If n — A; — >■ oo the limits are the same as in the known-variance case 
if ICil < 1 or ICJ > 1 holds, namely 1 and 0, respectively. However, in the "boundary" case 
ICjl = 1 the rate at which n — k diverges to infinity becomes relevant. If the divergence is fast 
enough in the sense that n^^'^rj^ „/ (n — k)^^"^ — )■ 0, again the same limit as in the known- variance 
case, namely $(ri), is obtained; but if n — /c diverges to infinity more slowly, a different limit 
arises (which, e.g., in case 4 of Part (b2) is obtained by averaging $(ri -I- •) with respect to a 
suitable distribution) . The case where the degrees of freedom n — fc is eventually constant looks 
very much different from the known- variance case and again some averaging with respect to the 
distribution of a /a takes place. Note that in this case the limiting variable deletion probabilities 
are 1 and 0, respectively, only if (^^ = and = oo, respectively, which is in contrast to the 
known- variance case (and the unknown- variance case with rt — fc — > oo). 



Remark 12 (i) For later use we note that Proposition |8] and Theorem 11 also hold when applied 
to subsequences, as is easily seen. 

(ii) The convergence conditions in Proposition |8] on the various quantities involving 9i_n 
and (T„ are essentially cost-free in the sense that given any sequence {9i^n,'^n) we can, due to 
compactness of M, select from any subsequence uj a further subsubsequence such that along 
this subsubsequence all relevant quantities such as n^/^0i_„/((T„^j „) (or ^i,n/(o'n^i,ra??i,Ti) 
ri^n) converge in E. Since Proposition [s] also holds when applied to subsequences as just noted, 
an application of this proposition to the subsubsequence then results in a characterization 
of all possible accumulation points of the variable selection/deletion probabilities in the known- 
variance case. 

(iii) In a similar manner, the convergence conditions in Theorem [IT] (including the ones on 
n— fc) are essentially cost-free, and thus this theorem provides a full characterization of all possible 
accumulation points of the variable selection/deletion probabilities in the unknown- variance case. 
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As just discussed, in the case of conservative tuning we get the same hmiting behavior under 
moving-parameter asymptotics in the known-variance and in the unknown-variance case along 
any sequence of parameters if n — /c — > oo or = (which in the conservatively tuned case can 
equivalently be stated as fi^^^?7i.„/ {n — k)^^^ — > 0). In the case of consistent tuning the same 



coincidence of limits occurs if n — fc — oo fast enough such that n^^^rj^ ^/ (n — k) 
is not accidental but a consequence of the following fact: 



1/2 



0. This 



Proposition 13 Suppose that for given i > 1 satisfying i < k — k(n) for large enough n we 
have n^/'^r]^ „(n — k)^^/"^ — > as n ^ oo. Then 



sup Pn 

6I(EK'=,0<ct<oo 







P„. 



for n oo. 



Remark 14 Suppose that „?7j „ — ^ holds as n oo, the other case being of little interest 
as noted earlier. If n^/'^rj^ „(n — does not converge to zero as n — >■ oo, it can be shown 

from Proposition |8] and Theorem [IT] that the limits of the variable deletion probabilities (along 
appropriate (sub) sequences {9^"'^\ an.)) for the known-variance and the unknown-variance case 
do not coincide. This shows that the condition n^^'^rj^ ri(^~ k)^^/^ — > in the above proposition 
cannot be weakened (at least in case „?7j n ~^ ^ holds). 



4 Consistency, Uniform Consistency, and Uniform Con- 
vergence Rate 

For purposes of comparison we start with the following obvious proposition, which immediately 
follows from the observation that 0LS.i is A^(0i, 0"^^^ „/n)-distributed. 

Proposition 15 For every i > 1 satisfying i < k = k{n) for large enough n we have the 
following: 

(a) „/n"'^/^ is a necessary and sufficient condition for O^sA to be consistent for 6i, the 
convergence rate being Ci.n/"^^^- 

(b) Suppose n/jT.^^^ — >■ 0. Then 9^3,1 uniformly consistent for 9i in the sense that for 
every e > 

' > ere) = 0. 



lim sup sup Pn.e.a 

n^oo ggjjfc 0<cr<oo 



'LSA 



In fact, Blsa is uniformly Ji^^^/^j .^ -consistent for 6i in the sense that for every e > there 
exists a real number Af > such that 



sup sup sup Pn,e,a \\n^''^ l£.i,n) 

new eeR* o<cr<oo ' 



> (jM < e 



[Note that the probabilities in the displays above in fact neither depend on 9 nor a . In particular, 
the l.h.s. of the above displays egitaZ 2$(— en^/^/^j „) and 2^{—M), respectively.] 

The corresponding result for the estimators 9H.i, 9s,i, or Bas.i and their infeasible counter- 
parts 9s,i^ or 9As,i is now as follows. 

Theorem 16 Let 9i stand for any of the estimators 9H,i, 9s.i, or 9AS,i- Then for every i > 1 
satisfying i < k = k{n) for large enough n we have the following: 
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(a) 9i is consistent for 6i if and only if „?7j „ — >■ and n/j^^^^ —5* 0. 

(h) Suppose ^inVin ~^ iin/^^^^ ~^ 0- Then Oi is uniformly consistent in the sense 
that for every e > 



lim sup sup Pnfi.a ( 

ggRfc 0<(T<OO ^ 



0. 



Furthermore, 9i is uniformly ai_n- consistent with ai^n = niin in}^"^ „, (Ci „?/i „) "'^) i/ie sense 
that for every e > there exists a real number A/ > such that 



> aM < e. 



sup sup sup Pn,e,cr (Oj,, 
neNegRfc 0<cr<oo ^ 

('cj Suppose „?7j „ — and ^/n^/'^ — and 6i_„ > 0. If for every e > there exists a 
real number M > such that 



lim sup sup sup Pn,0,a [bi,n 



> aM < e 



(6) 



holds, then 6i.„ — 0(ai_„) necessarily holds. 

(d) Let 6i stand for any of the estimators Oh,!, Os,i, or9AS,i- Then the results in (a)- (c) also 
hold for 9i. 

The preceding theorem shows that the thresholding estimators On^i, Os.i, and 9AS,i (as well 
as their infeasible versions) are uniformly ai_„-consistent and that this rate is sharp and cannot 
be improved. In particular, if the tuning is conservative these estimators are uniformly n^/^/^j „- 
consistent, which is the usual rate one expects to find in a linear regression model as considered 
here. However, if consistent tuning is employed, the preceding theorem shows that these thresh- 
olding estimators are then only uniformly (^^ „?7j „)~ ^-consistent, i.e., have a slower uniform 
convergence rate than the least-squares (maximum likelihood) estimator (or the conservatively 
tuned thresholding estimators for that matter). For a discussion of the pointwise convergence 
rate see Section |6^ 



Remark 17 If n^^^rj^ „ — ?> = 0, then 6i is asymptotically equivalent to ^LS,j in the sense that 
for every e > 



lim sup sup Pr 



0<(T<OO 



0. 



A similar statement holds for 9i. For 9i this follows immediately from (27) in Section [s] and the 
fact that the family of distributions corresponding to p^-k is tight; for 9i this follows from the 



relatii 



'LS,i 



Remark 18 (i) A variation of the proof of Theorem 16 shows that in case of consistent tuning 
for the infeasible estimators additionally also 



lim sup sup Pn.e,a [ai 

• 0<(T<OO ^ 



holds for every M > 1, and that for the feasible estimators 



lim sup sup Pn,e,a [ai 

• 0<(T<OO ^ 



> aM 



> aM 
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holds for every M > 1 provided that n — fc — ^ oo. 

(ii) Inspection of the proof shows that the conclusion of Theorem [l6|[c) continues to hold if 
the supremum over M*^ is replaced by the supremum over an arbitrarily small neighborhood of 
and a is held fixed at an arbitrary positive value. 



(iii) If ae and aM are replaced by e and M, respectively, in the displays in Proposition 15 



and Theorem 16 as well as in Remark |17[ the resulting statements remain true provided the 
suprema over < a < oo are replaced by suprema over < a < c, where c > is an arbitrary 
real number. 

5 Finite- Sample Distributions 

5.1 Known- Variance Case 

We next present the finite-sample distributions of the infeasible thresholding estimators. It will 
turn out to be convenient to give the results for scaled versions, where the scaling factor ai^n is 
a positive real number, but is otherwise arbitrary. Note that below we suppress the dependence 
of the distribution functions of the thresholding estimators on the scaling sequence Ui^n in the 
notation. Furthermore, observe that the finite-sample distributions depend on 9 only through 

Proposition 19 The cdf H]j^^g^^ := H\j^^^^g^^ of (T-^ai,n{9H,i - di) is given by 

Hh^n.eA"^) = * 1 + 0^/<J\ > e,,„?7^,„) 

-f $ {-ejia^^j + r;,^„)) 1 (O < + 9J<j < e,,„r;,^„) 

-f $ (-0,/«,J - 1 (-e.n??.,,. < ar^,x + 9,/a < O) , (7) 

or, equivalently, 

+ {n'^V{a^.n^,J) (ni/2x/(a,,„e,,„)) 1 + O^/'j] > ^^,nV^.n) dx (8) 

where denotes pointmass at z. 

Proposition 20 The cdf H'g^^ g^^ := „,n,9,a of a~^ai^n{Os,i - St) is given by 

Hh,n,eA^) = * (ni/2a;/(a,,„e.,J + n'^\^„) 1 (a'^ + 9Ja > O) 

+$ (n'^^x/ia,,ni,^„) - ni/2?7,_„) 1 {ar^x + 9Ja < O) , (9) 

or, equivalently, 

dHh,n.eA^) = ("'^' (-^./«,„) + Vr,n)) - * (jl''^ (^^./K,, J " V^.,n))} d<^-c.„„fl. (x) 

+ (n'/V(«^,ne,j) {<P (n^/'x/(a,,„^,,„) + n^'^ll^,n) 1 {o^i> + 9,1 G > 0) (10) 
+</) (ni/\/(a,,„C,,„) - n^/^v^,n) 1 + 9,/a < O)} dx. 
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Proposition 21 The cdf H\g^ g^^ := i?AS,,7i „,n,e,o- °f ^ ^0l^.n{9ASa - di) is given by 

H\s.n,eAx) = * 1 + e,/<j > 0)+$ 1 {ar^x + BJa < O) , 

(11) 

where zl^^g^^{x,y) < zl^),^^{x,y) are defined by 

0.5n'/%^ial'„x - 9, /a) ± n'/^ ^ {0.5^rJ^{a-J^x + O^/af + y^ . 
Or, equivalently, 

+(0.5ni/V(a,,„e.,„)) {0 (4'L(^''?^«)) (1 +t„,e,.(x,77,^J)l {a^lx + > O) 

+ ^ {znlA^^^^,n)) (1 - ^n.e ,Vt.n))'^ {a~lx + 6'i/(T < O) } , 

where t^^A^^v) = 0-5^^ + ^./^) / ((O-S^.^' (""^a: + 9,/a)f + y^f' . 

The finite-sample distributions of OH,i, ds,ii a-nd 9AS,i sjie seen to be non-normal. They are 
made up of two components, one being a multiple of pointmass at —ai,nOi/a and the other one 
being absolutely continuous with a density that is generally bimodal. For more discussion and 
some graphical illustrations in a special case see Potscher and Leeb (2009) and Potscher and 
Schneider (2009). 

Remark 22 In the case where X'X is diagonal, the estimators of the components 9i and 6j 
for i ^ j are independent and hence the above results immediately allow one to determine the 
finite-sample distributions of the entire vectors 0h, ^s, and 0as- In particular, this provides 
the finite-sample distribution of the Lasso and the adaptive Lasso in the diagonal case 
(cf. Remarks [1] and [2]). 

5.2 Unknown- Variance Case 

The finite-sample distributions of OH,i, Os,i, OAS,i are obtained next. The same remark on the 
scaling as in the previous section applies here. 

Proposition 23 The cdf H'^^ g^^ := H]^^^^^ ,^ g^^ of a-^a,^n{OH,i - Oi) is given by 

Hf,n.eA^) = '^{^'^^^/i^^,nC^,n)) I ^ (H^X + OJ a\ > ^^^^87^,^ Pn-kis)ds (12) 
+ * ("'^' (-^»/«.n) - SV^A) 1 {~knSV^,n < + O^/d < O) p„_fc(s)ds. 

Or, equivalently, 

dHt^^oA^) = ^°°{*(«'/'(-^^^/«,J + 5^.j) (13) 
_$ (^„i/2 [~e,/{ai,A - ^'n^A) } Pn-k{s)dsd5^^^ ^e^/a{x) + n^/^a-i^-i 
X(j){n^/'^x/{a,^n£.tA) j '^{\'^'i,lx + 9,/a\> i^ ^^s-q^A Pn-k{s)dsdx. 
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Proposition 24 The cdf Hf^^ ^^^ := i?^* of a ^ai^n{Os,i - Oi) is given by 

Hfn,0,A^) = ^ $(ni/VKn?«,„) + "'/'s»?.,„)p„^fc(s)d5l(a-> + 0,/a>O) 
+ ^ * - n^'^srii^^ p„_k{s)dsl (ar> + 0i/a < O) 

+T„_k-n^/^x/(a,,„u.J (-"^''^'^i.n) 1 K>a; + 6'i/c7 < O) . (14) 

Or, equivalently, 

dWs%^,^,{x) = ^~{$(ni/M-(?./«,J + sr?.,„)) (15) 
-$ (-0i/(aCi,„) - sr?i,„)) } p„_fe(s)dsd(5_„._„e./<,(a;) + n'/'ai;^e»> 
X 1^ <^ («'^'a;/(ai,„^,,„) + nV^s,,. p„_,(s)rfsl (ar> + OJa > O) 

+ ^ ^ ("^^^2;/(ai,„^j_„) - n^^^srji^^^ p^_^{s)ds\ {a~^x + 0i/a < O) | da;. 
Proposition 25 T/ie cdf H% ,^ g^^ := H%^^^^^^ g^^ of a-'^ai^ni^AS.i - Oi) is given by 

poo 

+ ^ {^niA^^ sVi,n)) Pn-kis)dsl {a'^x + < O) . (16) 

Or, equivalently, 

dHfs,n,eA^) = {^{^'^'i-0^/H^,n)+^V^,n)) (17) 

(„i/2 {-ei/{aCi^J - sriij) } p„_fe(s)cZsd5_„. „e,/<,(x) + (0.5ni/V(ai,nCi,„)) 

^{lo ^ (4a<^(^' ^%n)) (1 + in,e,<T(a;, S77i_„))p„_fc(s)dsl {arj^x + Oi/a > O) 

+ ^ ^ (1 - tnfiAx, sVi,n))Pn-k{s)dsl (a'^x + Oi/a < 0) I dx. 

As in the known-variance case the distributions are a convex combination of pointmass and 
an absolutely continuous part. In case of hard-thresholding, the averaging with respect to the 
density p^-k smoothes the indicator functions leading to a continuous density function for the 
absolutely continuous part (while in the known-variance case the density function is only piece- 
wise continuous, cf. Figure 1 in Potscher and Leeb (2009)). This is not so for soft-thresholding 
and adaptive soft-thresholding, where the averaging with respect to the density p^-k does not 
affect the indicator functions involved; here the shape of the distribution is qualitatively the same 
as in the known-variance case (Figure 2 in Potscher and Leeb (2009) and Figure 1 in Potscher 
and Schneider (2009)). 
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Remark 26 In the case where X' X is diagonal, the finite-sample distributions of the entire 
vectors 9h, 8s, and 9 as can be found from the distributions of 9h, Os, and 9 as (see Remark 22) 
by conditioning on a = sa and integrating with respect to p„_fc(s)- In particular, this provides 
the finite-sample distributions of the Lasso 9^ and the adaptive Lasso 9 as in the diagonal case 
(cf. Remarks [l] and [2]) . 



6 Large-Sample Distributions 

We next derive the asymptotic distributions of the thresholding estimators under a moving- 
parameter (and not only under a fixed-parameter) framework since it is well-known that asymp- 
totics based only on a fixed-parameter framework often lead to misleading conclusions regarding 



the performance of the estimators (cf. also the discussion in Section 6.4) 



6.1 The Known- Variance Case 

We first consider the infeasible versions of the thresholding estimators. 

Proposition 27 Suppose that for given i > 1 satisfying i < k — k{n) for large enough n we 
have nVi.n ~^ '^"■'^ n^^'^rij^ „ — )■ where < < 00. 

(a) Assume < 00. Set the scaling factor ai^n — ''^^^'^ Hi n- Suppose that the true parameters 
0^"^ = {9i^„,...,9k„,„) e R^- and a„ e (0,oo) satisfy ni/26',,„/(a„C,_„) ly, e R. Then 

n e("> (T converges weakly to the distribution with cdf 

$ (x) 1 {\x + Vi\ > Ci) + $ (-z^j + e^)l{0<x + ^y^<e^) + <^> {-V, - e,) 1 {-e, <x + v,<Q), 
the corresponding measure being 

{-Vi -f - * {->^i - Ci)} dS^^^ {x) + (.x) 1 {\x + v.^> e,) dx. (18) 

[This distribution reduces to a standard normal distribution in case \i' i\ — 00 or e^ = Q.] 

(b) Assume = 00. Set the scaling factor ai,n = (S.inVin) ^- Suppose that the true 
parameters 6'*^"-' = {9i,n, • ■ • , &k„,n) G K''" and cr„ G (0, 00) satisfy 6'i,n/(a'n^i,„?7i,„) ^ Ci G ^■ 

1. If ICjl < 1, then H\j ^ ^ converges weakly to 

2. If ICjl > 1, then H^j ^ g(„) ^ converges weakly to Sq. 

3- If ICil = 1 and n^/^ (j^.^^ _ Cj6',:,n/(cr„C»,„)) ~^ r,, for some rt G R, then i?)^ „ 
converges weakly to 

'^in)S-c, + (1 - Hr,))So. 

Proposition 28 Suppose that for given i > 1 satisfying i < k = k{n) for large enough n we 
have ^j n ^ ^ and n^/'^rj^ „ where < < 00. 

(a) Assume Ci < 00. Set the scaling factor ai^n — "■^^'^/^i n- Suppose that the true parameters 
0^"^ = (6'i^„,...,6'fc„,„) e M''- and a„ e (0,oo) satisfy n^l'^9^,nl {0,,^,^^) y^ (^R- Then 
H^g „ CT converges weakly to the distribution with cdf 

^ {x + Ci) \ [x + Vi > Q) + ^ {x - ei) l{x + Vi < 0) , 
the corresponding measure being 

{$ (-//j + e,) - $ {-ly, - e,)} dS-^. {x)+{(l> (x + e,) 1 {x + v, > 0) + (j) {x - a) 1 {x + < 0)} dx. 

(19) 
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[This distribution reduces to a iV(— sign(i/j)ei, 1)- distribution in case \vi\ =00 or Ci = O.J 

(b) Assume = 00. Set the scaling factor ai^n = (^i nVi n) ■ Suppose that the true 
parameters 9^"^ = • • • , ^fe„,„) e K'^" and C7„ e (0,oo) satisfy 6i,n/{<7n^i,n'rii,n) Ci e K. 
Then H'^^^^^qm^^^ converges weakly to ^-sign(Ci)min(i,|Cil)- 

Proposition 29 Suppose that for given i > I satisfying i < k = k{n) for large enough n we 

have n'Hi.n ^ '^"-'^ ""^^^^/.n ^ "^i where < < 00. 

(^o^ Assum,e < 00. 5ei f/«e scaling factor Ui^n = f^^"^ Iki n- Suppose that the true parameters 
^^"^ = (^?i.n,...,^fe„,„) e M^" and an e (O.oo) safe/y n^'^O^^nl {<JnU,n) ^ e M. T/ien 
"^As n converges weakly to the distribution with cdf 

$ (^0.5(a; - i/i) + ^(0.5(a; + zvi))'+e2^ l{x + Vi>Q) 

|^0.5(a; - I/,) - sJ{Q.h{x + Vi) f ^ef^ l{x + v,<Q) 

in case \vi\ < 00, the corresponding measure being 

(-z/j + e,) - $ (-z/i - e,)} d5-uXx) 



(20) 



+0.5 (^0.5(x - z^i) + V (0.5(x + !/i))' + ef j (1 + t{x)) l{x + v,>Q) 
+ (f> (o.5{x - Ui) - ^(0.5(a; + !/i))^ + e-) (1 - 1 (a; + i/^ < 0)| dx, 



where t{x) = (x + Ui) / ^ (^{x + Vif' + 4e?^ . /n case \vi\ = 00, the cdf H^^^ ^ g^„-j ^ converges 

weakly to i.e., to a standard normal distribution. [In case ej = the limit always reduces to 
a standard normal distribution. J 

(b) Assume Ci = 00. Set the scaling factor ai^n = (^i nVi n) ■ Suppose that the true 
parameters 6i^"^ = (^i,„, . . . , ^fe„,„) e M'=" and £7„ e (0, 00) satisfy Oi^nl {(^n^i,nVi,n) ^ Cj e 1. 

1. // ICil < 1, i/ien -ff^g ^ g(„) ^ converges weakly to S^(^. . 

2. If 1 < \Ci\ < 00, then ^^5„g(n) ^ converges weakly to (5_i/^^. 
5. // ICjl = 00, then H^g^ g(n) ^ converges weakly to Sq. 

Observe that the scahng factors a,,,, used in the above propositions are exactly of the same 
order as ai^„ in the case of conservative as well as in the case of consistent tuning and thus cor- 
respond to the uniform rate of convergence in both cases. In the case of conservative tuning the 
limiting distributions have essentially the same form as the finite-sample distributions, demon- 
strating that the moving-parameter asymptotic framework captures the finite-sample behavior 
of the estimators in a satisfactory way. In contrast, a fixcd-paramctcr asymptotic framework, 
which corresponds to setting = 9i and an = o in the above propositions, misrepresents the 
finite-sample properties of the thresholding estimators whenever 9i ^ but small, as the fixed- 
parameter limiting distribution is - in case of hard-thresholding and adaptive soft-thresholding 
- then always A'^(0, 1), regardless of the size of 9i. For soft-thresholding we also observe a strong 
discrepancy between the finite-sample distribution and the fixed-parameter limit for 9i ^ which 
is given by N{— sign{9i)ei, 1). In particular, the above propositions demonstrate non-uniformity 
in the convergence of finite-sample distributions to their limit in a fixed-parameter framework. 
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In the case of consistent tuning we observe an interesting phenomenon, namely that the 
Hmiting distributions now correspond to pointmasses (but not always located at zero!), or are 
convex combinations of two pointmasses in some cases when considering the hard-thresholding 
estimator. This essentially means that consistently tuned thresholding estimators are plagued by 
a bias-problem in that the "bias-component" is the dominant component and is of larger order 
than the "stochastic variability" of the estimator]^ In a fixed-parameter framework we get the 
trivial limits i5o for every value of 9i in case of hard-thresholding and adaptive soft-thresholding. 
At first glance this seems to suggest that we have used a scaling sequence that does not increase 
fast enough with n, but recall that the scaling used here corresponds to the uniform convergence 
rate. We shall take this issue further up in Section [6^ The situation is different for the soft- 
thresholding estimator where the fixed-parameter limit is 5_sign(ei)? which reduces to only 
for 0i = 0; this is a reflection of the well-known fact that soft-thresholding is plagued by bias 
problems to a higher degree than are hard-thresholding and adaptive soft-thresholding. 



6.2 Uniform Closeness of Distributions in the Known- and Unknown- 
Variance Case 

We next show that the finite-sample cdfs of Oha, ds,i, and dAS,i and of their infeasible counter- 
parts 9H,i, 9s,i^ and Oas.ii respectively, are uniformly (with respect to the parameters) close in 
the total variation distance (or the supremum norm) provided the number of degrees of freedom 
n — k diverges to infinity fast enough. Apart from being of interest in their own right, these 



results will be instrumental in the subsequent section. We note that the results in Theorem 30 
below hold for any choice of the scaling factors ai^n- 

Theorem 30 Suppose that for given i > 1 satisfying i < k = k{n) for large enough n we have 
n^/'^T]^ „(n — /c)^^/^ — > as n — > oo. Then 



sup - Hf^nfiArv -l""^^ 

eeK'=,o<CT<oo 



sup \\Hh,n,e,^ - H's%^e,a\\TV ^ ^ forn-^oo, 



eGR'=,0<cr< 



sup \\HAS,n,e,a- HAS,n,e,a\\oo^'^ fom^OO 

6»eK'=,0<<T<oo 



holdE 



Remark 31 In case of conservative tuning, the condition n^^'^ri^^{n — fc)~^/^ — ^ is always 
satisfied ii n — k — >■ oo. [In fact it is then equivalent to n — fc — )• oo or = 0.] In case 
of consistent tuning n — k ^ oo is clearly a weaker condition than n^^^rj^ „ {n - fc)-i/2 -> 0. 
However, in general, a sufficient condition for n^^^r/.^^^^n — fc)^^/^ — > is that 7?^ „ — > and 
limsup„^g^ k/n < 1. 

■^For the hard-thresholding estimator some randomness survives in the Umit in the case |f J = 1, where we can 
achieve a limiting probability for 8^ ^ = that is strictly between and 1. That this randomness does not survive 
for the other two estimators in the limit seems to be connected to the fact that these estimators are continuous 
functions of the data, whereas 9H,i is not. 

^Uniform closeness of the respective cdfs of the adaptive soft-thresholding estimators in the total variation 
distance, and not only in the supremum norm, could probably be obtained at the expense of a more cumbersome 
proof. We do not pursue this. 
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Remark 32 Suppose that nVi.n ~^ holds as n — >■ oo. If n^/^?7j „(n— does not converge 
to zero as n — oo, Remark [fi] shows that none of the convergence results in Theorem 30 holds. 
[To see this note that the variable deletion probabilities constitute the weight of the pointmass 
in the respective distribution functions.] This shows that the condition n^^^r]^ „(n — fc)^^/^ — > 
in the above theorem cannot be weakened (at least in case S,i^rJli,n ~^ holds). 

6.3 The Unknown- Variance Case 
6.3.1 Conservative Tuning 

We next obtain the limiting distributions of Oha^ (^S,ii and 9AS,i in a moving-parameter framework 
under conservative tuning. 

Theorem 33 (Hard-thresholding with conservative tuning) Suppose that for given i > 1 sat- 
isfying i < k — k(n) for large enough n we have „?7j „ and n^/'^rj^ „ — >■ where 

< < oo. Set the scaling factor ^ — n^^'^/S.in- Suppose that the true parameters 
0^"^ = [Oi^n, 9k„,n) e K*^" and a,, e (0, oo) satisfy ni/26»,,„/(a„C,,„) ^ i^. G M. 

(a) If n—k is eventually constant equal to m, say, then -ff^^ g(„) ^ converges weakly to the 
distribution with cdf 

poo 

/ {$ (a;) 1 {\x + > sci) + * (-i/j + se^) 1 (0 < x + < sei) 
Jq 

+ $ {~Vi - sCi) 1 {-sCi <x-\-Vi< 0)} p^{s)ds, 
the corresponding measure being 

poo />oo 

/ (-i^i + scj) - $ (-j/j - sei)}p„(s)dsd(5_^^(2:) + 0(x) / \ {\x Vi\ > sCi) p^{s)dsdx. 

Jo JQ 

(21) 

[The distribution reduces to a standard normal distribution in case — oo or e.; 0./ 

(b) If n — k — > oo holds, then gCn) ^ converges weakly to the distribution given in 
Proposition 21(a). 

Theorem 34 (Soft-thresholding with conservative tuning) Suppose that for given i > 1 satisfying 

1 < k = k{n) for large enough n we have ^i^nVi n ^ '^'^'^ ^^^^Vi n ^ where < < oo. Set 
the scaling factor Ui^n = '^^^^/^i n- Suppose that the true parameters = (^i,n, • ■ • , ^fe„.n) G 
M''" and o-„ G (0,oo) satisfy n^^'^Oi,n/{o-n£,i.n) ~^ ^ ^■ 

(a) If n — k is eventually constant equal to m, say, then H^^^ g(„) ^ converges weakly to the 
distribution with cdf 

/ {$ (a; + scj) 1 {x + i^i > 0) + ^ {x - se^) l{x + i^, < 0)} p^{s)ds, 
Jo 

the corresponding measure being 

{(f> {-i/i + scj) - $ (-i/j - sCi)} Pj^{s)dsdS-^.{x) 
+ I {0 (x + se,) 1 (a; + i/j > 0) + (/) (a; - sa) l{x-\-iy, < 0)} p^{s)dsdx. (22) 
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[The atomic part in the above expression is absent in case \ — oo. Furthermore, the distribution 
reduces to a standard normal distribution if Ci = 0./ 

(b) If n — k — )• oo holds, then H^^^ g(n) ^ converges weakly to the distribution given in 



Proposition 28(a) 



Theorem 35 (Adaptive soft-thresholding with conservative tuning) Suppose that for given i>\ 
satisfying i < k — k{n) for large enough n we have „77j „ — > and n^^^rj^ „ — >■ where 
< Ci < oo. Set the scaling factor ai_n = ^^^^/^i n- Suppose that the true parameters 0^"'' — 
{Oi^n, ■ • • , Ok^.n) e M''" and cr„ e (0, oo) satisfy Ti^/^6'i,«/(cr„6,^„) ^ v, g M. 

(a) Suppose n — k is eventually constant equal to m, say. Then Hfg ^ g(„, ^ converges weakly 
to the distribution with cdf 

^ $ (^0.5{x - V,) + ^{{).^{x + Vi)f +s^e(^ P,n{s)dsl {x + v,> 0) 

+ ^ $ (^0.5(x - V,) - ^J{Q.b{x + Vi)f + s-^ej^ Pm{s)dsl {x + ly, < 0) (23) 
in case < oo, the corresponding measure being given by 

/•oo 

/ {$ (-1/, + sci) - se^)} p^{s)dsdS-,,^{x) 

Jo 

+0.5^ 1^ (o.5{x -Vi) + \J (0.5(2; + Vi)f + s^e{^ [l +t{x,s))l{x + > 0) 

+ (f) [Q.b{x-Vi) - \J{id.b{x + v,)f +s^e{^ {l^t{x,s))l{x + v, < 0)^ p^{s)dsdx, 

where t{x, s) = {x + Vi) /-y/ {{x + Vif' + '^s'^ef^ . In case \vi\ = 00, the cdf H^^^ ^ g(„) ^ converges 

weakly to $, i.e., a standard normal distribution. [If Ci ~ 0. the limit always reduces to a standard 
normal distribution.] 

(b) Ifn-k^ 00, thenHfg^g^^^ ^ converges weakly to the distribution given in Proposition 

ma). 



It transpires that in case of conservative tuning and rt — A; — > 00 we obtain exactly the same 
limiting distributions as in the known- variance case and hence the relevant discussion given at the 



end of Section 6.1 applies also here. [That one obtains the same limits does not come as a surprise 



given the results in Section 6.2 and the observation made in Remark 31 ] In the case, where 
n — A: is eventually constant, the limits are obtained from the limits in the known- variance case 
(with a replaced by as) by averaging with respect to the distribution of a /a. Again the limiting 
distributions essentially have the same structure as the corresponding finite-sample distributions. 
The fixed-parameter limiting distributions (corresponding to setting Oi^n = Qi and cr„ = tr in the 
above theorems) again misrepresent the finite-sample properties of the thresholding estimators 
whenever Oi ^ Q but small, as the fixed-parameter limiting distribution is - in case of hard- 
thresholding and adaptive soft-thresholding - then always iV(0, 1), regardless of the size of Oi. 
For soft-thresholding we also observe a strong discrepancy between the finite-sample distribution 
and the fixed-parameter limit especially for Oi ^ Q but small, which is given by the distribution 
with pdf Jg 4>{x + ssign(0i)ei) p„^{s)ds regardless of the size of 9i. As a consequence, we again 
observe non-uniformity in the convergence of finite-sample distributions to their limit in a fixed- 
parameter framework also in the case where the number of degrees of freedom is (eventually) 
constant. 
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6.3.2 Consistent Tuning 

We next derive the limiting distributions of 9H,i, Os,i, and 9AS,i in a moving-parameter framework 
under consistent tuning. 

Theorem 36 (Hard-thresholding with consistent tuning) Suppose that for given i>l satisfying 
i < k = k{n) for large enough n we have „77j „ ^ and n^^'^r]^ „ — )• oo. Set the scaling 

factor ai^n = {(.i nVi.n) ^ ■ Suppose that the true parameters 6^"'^ = . . . ,Okn,n) € M'^" and 

<Jn e (0,00) satisfy 6'i,„/((7„^j „?7i_„) ^ Ci € K. 

(a) If n — k is eventually constant equal to m, say, then H^^^ g(„) ^ converges weakly to 



( roo \ / POO \ 



[The above display reduces to Sq for =00.] 
(b) If n — k ^ 00 holds, then 

1. ICil < 1 implies that W^^^^^^ ^ converges weakly to S-i;.. 

2. ICtI > 1 implies that H^^^ g(„) ^ converges weakly to Sq. 

3. iCil = 1 and ri^l'^r]^ ,^! (n — fc)^^^ imply that H^^^ g(„) ^ converges weakly to 

$(ri)5_^^+(l-$(n))5o 

provided ri^n = n^'^ {Vi,n - CiSi,n/{(^n^i,n)) n for some ri G 1. 

4. ICil = 1 and n^/'^r]i,n/ - ^ 2^2^. with < dj < 00 imply that H^^^ ^^^^ ^ 
converges weakly to 

(^J ^dit + ri)(j){t)dt^ + (1 - / + n)Ht)dt^ So 

provided ri^n — > ri for some ri G K. [Note that the above display reduces to 5-^ . if rt = 00, and 
to 60 ifri = -CO.] 

5. \Qi\ = 1 and n^^'^rii ,^/ [n — fc)"^^^ — )• 00 imply that H^^^ g(„) ^ converges weakly to 

<^{r'i)5_^^ + {l-^r'i))6^ 

provided (n^^^rii,n/ (^^ ~ ^)^^^) ''i," ~^ '2~^^'^r'i for some r'i G M. 

Theorem 37 (Soft-thresholding with consistent tuning) Suppose that for given i>l satisfying 
i < k = k{n) for large enough n we have ^Tjj „ ^ and n^^'^ri^ ,^ — )• 00. Set the scaling 

factor oti^n = {£,i,nVi.n) ^ ■ Supposc that the true parameters 0^"^ = {9i,n, ■ ■ ■ >^fe„,n) € IR*^" and 

an e (0,00) satisfy Oi,n/{crn^i,nVi,n) ^ Ci ^ 

(a) If n — k is eventually constant equal to m, say, then H^^^ g(„) ^ converges weakly to the 
distribution given by 



IC.I 



p„(s)dsd(5_c. {^) + + C» < 0) + P„r{-X)l {X + C,> 0)} dx 



= Pv{xl,>mCi)dd-i^{x)-t{Pm{x)l{x + Ci<0)+Pmi-xn{x + Ci>0)}dx, (24) 
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where we recall the convention that p^ix) = for x < 0. [In case |<^j| — oo, the atomic part in 



(24-) is absent and (24) reduces to sigii(Cj)a;)da;.y 

(b) If n — k ^ oo holds, then H^^^ g(„) ^ converges weakly to (5_ gig^jj^.) miufi . 

Theorem 38 (Adaptive soft-thresholding with consistent tuning) Suppose that for given i > 1 
satisfying i < k = k{n) for large enough n we have n ^ and n^/'^rj^ „ — >■ oo. Set the scaling 

factor ai^n — nVi n) ^ ■ Suppose that the true parameters 0*-"'' — (6'i,„, . . . ,Ok„^n) G IR*^" and 
an e (0,oo) satisfy Oi^n/{a7i^i,nVi,n) ^ Ci e ]R- 

(a) Suppose n — k is eventually constant equal to m, say. Then H^^g ^ ^ converges weakly 
to the distribution with cdf 

/oo 
, P,n{s)dsl (-C, < X < 0) + 1 (X > 0) 

= Pr(x^, > m \xQ)l (-C, < x < 0) + 1 (x > 0) 
in case < Ci < oo, and to the distribution with cdf 



(0 < X < -Q + l{x> -Q 
= Pr(xL < m \xQ)l {Q < x < -Q + l{x> -Q 







in case —oo < < 0. Furthermore, -f^^^„0(n) ^ converges weakly to 6o if \Ci\ = oo. [In case 
ICjl < oo, the distribution has a jump of height Jl^^Pm{s) — Pi'(Xm > '^^Ci) at x = — Cj and is 
otherwise absolutely continuous. In particular, it reduces to Sq in case ~ 0./ 
(b) If n — k ^ oo holds, then 

1- ICil ^ 1 implies that H^^g ^ g(n) ^ converges weakly to 

2. 1 < ICil < oo implies that H^^g ^ g{n) ^ converges weakly to 

3. ICil = oo implies that H^^g ^ ^ converges weakly to 5q. 



We know from Theorem 
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that we obtain the same hmiting distributions for On.i, Os,i, and 
0AS,i as for 0H,i, 0s.il and Oas.i, respectively, provided n — k diverges to infinity sufficiently fast 
in the sense that n^^^rj^ „(n — k)~^^'^ — >■ 0. The theorems in this section now show that for the 
soft-thresholding as well as for the adaptive soft-thresholding estimator we actually get the same 
limiting distribution as in the unknown- variance case whenever n — k diverges even if n^/'^rj^ „(n— 
/j)-i/2 _^ is violated. However, for the hard-thresholding estimator the picture is different, 
and in case n — k diverges but n^^'^rj^ „(n — — >■ is violated, limit distributions different 
from the known-variance case arise (these limiting distributions still being convex combinations 
of two pointmasses, but with weights different from the known- variance case). It seems that 
this is a reflection of the fact that the hard-thresholding estimator is a discontinuous function 
of the data, whereas the other two estimators considered depend continuously on the data. 
The fixed-parameter limiting distributions for all three estimators are again the same as in the 
known- variance case. 

In the case where the degrees of freedom n—k are eventually constant, the limiting distribution 
of the hard-thresholding estimator is again a convex combination of two pointmasses, with weights 
that are in general different from the known-variance case. However, for the soft-thresholding 
as well as for the adaptive soft-thresholding estimator the limiting distributions can also contain 
an absolutely continuous component. This component seems to stem from an interaction of the 
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more pronounced "bias-component" (as compared to hard-thresholding) with the nonvanishing 
randomness in the estimated variance. The fixed-parameter hmiting distributions for hard- 
thresholding and adaptive soft-thresholding are again given by Sq for all values of 9i as in the 
known-variance case, whereas for soft-thresholding the fixed-parameter limiting distribution is 
So only for 9i — and otherwise has a pdf given by sign(6'i)a;) (as compared to a limit of 

S-sign(ei) in the known- variance case). 

6.4 Consistent Tuning: Some Comments on Fixed-Parameter Large- 
Sample Distributions and the "Oracle-Property" 

6.4.1 Hard-Thresholding and Adaptive Soft-Thresholding 

As already mentioned at the end of Sections |6.1| and |6.3.2[ under consistent tuning the fixed- 
parameter limiting distributions of the hard-thresholding and of the adaptive soft-thresholding 
estimator ~ in the known-variance as well as in the unknown-variance case - always degenerate 
to pointmass at zero. Recall that in these results the estimators (after centering at 9i) are scaled 
by {^inVin) 1 which corresponds to the uniform convergence rate. We next show that 
if the estimators are scaled by cr~^n^/^^~^ instead, a limit distribution under fixed-parameter 
asymptotics arises that is not degenerate in general (under an additional condition on the tuning 
parameter in case of adaptive soft-thresholding). In fact, we show that the hard-thresholding as 
well as the adaptive soft-thresholding estimators then satisfy what has been called the "oracle- 
property" . However, it should be kept in mind that ~ with this faster scaling sequence <7~^n^/^^~^ 
- the centered estimators are no longer stochastically bounded in a moving-parameter framework 



(for certain sequences of parameters), cf. Theorem 16 This shows the fragility of the "oracle- 
property", which is a fixed-parameter concept, and calls into question the statistical significance 
of this notion. For a more extensive discussion of the "oracle-property" and its consequences see 
Leeb and Potscher (2008), Potscher and Leeb (2009), and Potscher and Schneider (2009). 

Proposition 39 Let Q < a < oo he given. Suppose that for given i > 1 satisfying i < k ^ k(n) 
for large enough n we have ^rj^ n ~^ and n^^'^rj^ „ — >■ oo. 

(a) a^^n^^^S^Yn {^H.i ~ 9ij as well as (J^^n^^'^Cini^H.i^ (^ij converge in distribution to 
N{0, 1) when 9, ^ 0, and to 8q = iV(0, 0) when 9^ = 0. 

(b) CF^^n^^'^£,^n i^AS.i ~ Sij as well as a^^n^^^S,^^ (^9As.i ~ 9ij converge in distribution to 
A(0, 1) when 9^ ^ 0, and to 5^ = N{0, 0) when 9i — 0, provided the tuning parameter additionally 
satisfies r}}-I^^J^r]^ n ^ " — > oo. 

Remark 40 Inspection of the proof of Part (b) given in Section |8.4| shows that the condition 

"^^^Ci.C^'yj.n -> is used for the result only in case 9i ^ 0. If now 77^ „ cj with < cj < 

00, inspection of the proof shows that then in case 9i 7^ we have that cr^^n^/^^^^ {9AS.i — — 

Zn — aLiP'9'^^ (a I a)^ -)-Op(l), where is standard normal and is independent of a ja. Hence, we 

see that the distribution of <^~'^ri}!'^i~\ {^AS,i — 9i^ asymptotically behaves like the convolution 

of an iV(0, l)-distribution and the distribution of —a(jp9^'^{n—k)^^ times a chi-square distributed 
random variable with n — k degrees of freedom (\i n — k 00 this reduces to an N{~au}^9Y^ , 1)- 

distribution). If J^^^'*^^ (f f7i,„ — ^ 00, then cr~-^n^/^f~^ (^AS,i — 9ij is stochastically unbounded. 
Note that this shows that the consistently tuned adaptive soft-thresholding estimator - even in a 
fixed-parameter setting - has a convergence rate slower than n^^'^^in if 7^ and if the tuning 
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parameter is "too large" in the sense that n^^^S.^nVi « ~^ The same conclusion applies to the 
infeasible estimator Oasa (with the simplification that one always obtains an N(—aLu^9Y^,l)- 
distribution in case J^^^^Clif n ~^ '-^ with < uj < oo). 

We further illustrate the fragility of the fixed-parameter asymptotic results under a cr~^n^^^£,Yn~ 
scaling obtained above by providing the moving-parameter limits under this scaling. Let n e a '■~ 
Ph,^, ,,,n.e,a denote the cdf of a-^n^/^^~^0H,i-Ot), and define and -FAs,n,e,<T analogously 

The proofs of the subsequent propositions are completely analogous to the proofs of Theorem 9 
in Potscher and Leeb (2009) and Theorem 5 in Potscher and Schneider (2009), respectively. 

Proposition 41 (Hard-thresholding) Suppose that for given i > I satisfying i < k — k(n) 
for large enough n we have S.inlin ^ and n^^'^'Hin ^ Suppose that the true parame- 
ters S*-"' = {Oi.n, ■ ■ ■ ,Ok,^,n) e M*-'" and a-„ G (0,oo) satisfy ?i"^^^^i.n/(o'nCi,n) -J* t^i e M and 
^i,nl {<^n£,i rJli n) ~^ Ci ^ [Note that in case d 7^ the convergence of n^^'^Oi n/ {(^iiii n) 
already follows from that of Oi.n/ {o'nd.n'ni n); '^'^^ '^i then given by sign(Ci)oo.y 

1. Suppose ICjl < 1. Then ^ ^ converges weakly to S^^. if {i^il < oo; if [vil = oo the 
total mass of ^ escapes to —Vi, in the sense that F^ ^ g(„) ^ (x) for every x G M 
if Vi — — oo, and that F^^ ^ ^ (a;) — > 1 for every x G if Vi = oo. 

2. Suppose \Ci\ > 1. Then ^ converges weakly to $. 

3. Suppose iCjl = 1 and n^^^ {Vi^n ~ Ci^i,n/io'n£,i^n)) ~^ some ri G M. Then 
■^H,n,e("),<7„(^) converges to 



$(r,)l(C. = l)+ / ^{t)l{C,t>r,)dt 



for every x G M. [In case ri = — oo the limit reduces to a standard normal distribution.] 

Proposition 42 (Adaptive soft-thresholding) Suppose that for given i > 1 satisfying i < k — 
k{n) for large enough n we have nVi.n ~^ and n^^^rj^ „ — oo. Suppose that the true parameters 
= (0i,„, . . . , 0fe„,„) G M^" and a„ G (0, 'x) satisfy ^»,„/(ct„C,,„77,,„) -^C^&^■ 

1- Ci = ^ o,nd n^^'^Oi^n/ {unii^n) ~^ ^ then F^^ ^ ^(„) ^ converges weakly to 8-^. 

2. The total mass of F^^ ^ ^ escapes to oo or — oo in the following cases: If —oo < < 0, 
or i/Cj = and ni/26',,„/(cr„^, „) -oo, or if = -oo and n^^'^Vi,n^t.nKn'^^ ^ 
^AS.n,e(").aJ^^ ^'"^'^y 2; e IR- //O < Ci < OO, or i/Ci = and ni/26',,„/(cr„^,_„) oo, 
or if C, = oo and n^^'^vlnS.t, Ji,n'^n ^ oo, then F\s ,^ g(^) ,^Jx) 1 for every x eM.. 

^- V ICil — oo and n^^^vl n^i,n^7n'^n Wi £ M, then F\g „ g{n) ^ converges weakly to 
^{■ + w,). 



It is easy to see that setting 9i,n = di and ct„ = cr in Proposition 41 immediately recov- 
ers the "oracle-property" for 9 ha- Similarly, we recover the "oracle property" for 9AS,i from 
provided '^-^^^'^Kf „ — ^ 0. The propositions also characterize the sequences of 
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Proposition 

parameters along which the mass of the distributions of the hard-thresholding and the adaptive 
soft-thresholding estimator escapes to infinity; loosely speaking these are sequences along which 
the bias of the estimators exceeds all bounds. 

The theorems in Section |6.2| also show that the last two propositions above carry over 
immediately to the unknown-variance case whenever n — fc — > oo sufficiently fast such that 
„(n - fc)-i/2 ^ holds. To save space, we do not extend these two propositions to the 
case where the latter condition fails to hold. 
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6.4.2 Soft-Thresholding 



The situation is somewhat different for the soft-thresholding estimator. It follows from Theorem 
37 that the distribution of cr~^(^j „r/^ n)^^ (j^sa ~ ^ij does not degenerate to pointmass at zero 
Jm fact, has no mass at zero) if 0^ 7^ and is held fixed. Consequently, {£.i,nVi.n)~^ ^1^° 
the fixed-parameter convergence rate of 0s,i, in the sense that scaling with a faster rate (e.g., 
n^^'^S.^n) leads to the escape of the total mass of the finite-sample distribution of the so-scaled 
(and centered) estimator to — sign(0i)cx). For 9i = we get with the same argument as for 

hard-thresholding that cr^^n^/^^"^ (^s,i — converges to Sq. For the infeasible version 9sa 
the situation is identical. We conclude by a result analogous to Propositions 41 and 42 The 
proof of this result is completely analogous to the proof of Theorem 10 in Potscher and Leeb 
(2009). 

Proposition 43 (Soft-thresholding) Suppose that for given i > 1 satisfying i < k = k{n) for 
large enough n we have S,i,nVin ~^ and n^^^rj^ ,^ — >■ 00. Suppose that the true parameters 
^^"^ = (0i.„,..-,^fc„,«) e R''- and CT„ e (0,to) satisfy n^^^9,^^/{a„^„J e R. Then 

^ g{n) ^ converges weakly to b-i,^ if \vi\ < 00; and if ~ 00, the total mass of F^^ ^^^s^ ^ 
escapes to —Vi, in the sense that F^g g(n) ^ i^) ~^ for every x £ R if = —00, and that 
-^S.n,0(").<T„(^) -> 1 for every x e M. if Vi = 00. 

Again, this proposition immediately extends to the unknown- variance case whenever n — k^ 
00 sufficiently fast such that n^/'^rj^^{n — fc)~^/^ — > holds. We abstain from extending the 
result to the case where the latter condition fails to hold. 

6.5 Remarks 

Remark 44 (i) The convergence conditions on the various quantities involving 9i^n and (t„ (and 



on n — /c) in the propositions in Sections 6.1 and 6.4 as well as in the theorems in Section 6.3 



essentially cost-free for the same reason as explained in Remark [12] 

(ii) We note that all possible forms of the moving-parameter limiting distributions in the 
results in this section already arise for sequences 9i^n belonging to an arbitrarily small neighbor- 
hood of zero (and with cr > fixed). Consequently, the non- uniformity in the convergence to the 
fixed-parameter limits is of a local nature. 

Remark 45 Potscher and Leeb (2009) and Potscher and Schneider (2009) present impossibility 
results for estimating the finite-sample distribution of the thresholding estimators considered in 
these papers. In the present context, corresponding impossibility results could be derived under 
appropriate assumptions. We abstain from presenting such results. 



7 Numerical Study 

As has been discussed in Remarks [T] and [2] in Section |2] the soft-thresholding estimator coincides 
with the Lasso, and the adaptive soft-thresholding estimator coincides with the adaptive Lasso 
in case of orthogonal design. A natural question now is if the distributional results for the 
(adaptive) soft-thresholding estimator derived in this paper are in any way indicative for the 
distribution of the (adaptive) Lasso in case of non-orthogonal design. In order to gain some 
insight into this we provide a simulation study to compare the finite-sample distributions of the 
respective estimators. 
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We simulate the Lasso estimator as defined in Remark 1 (with rj[^ = ?7i „Cj ^ and ?7j „ = ?7„ 
not depending on i) and the adaptive Lasso estimator as defined in Remark [2] (with r/'^ „ = ?7„ 

not depending on i) and show histograms of n^^^cr^^S.Yn {^i ~ ^0 where 9i stands for the i-th 
component of Lasso or adaptive Lasso. [The scahng used here is chosen on the basis that with 
this scahng the i-th component of the least-squares estimator is standard normally distributed.] 

We set n — 8 and fc = 4, resulting in n — fc = 4 degrees of freedom. Two different types of 
designs are considered: for Design I we use X'X = nQ,{p) with fl{p)ij = More concretely, 

X is partitioned into d — n/k = 2 blocks of size x fc and each of these blocks is set equal to 
k}/^L with LL' = ^{p), the Cholesky factorization of Q{p). The value of p is set equal to 0.3, 
0.5, and 0.9, implying condition numbers for X'X of 2.7, 5.6, and 57.0, respectively. Design 
II is an "equicorrelated" design. Here we set the matrix comprised of the first fc rows of X 
equal to Ik + cE^ , where Ek is the k x k matrix with all components equal to 1 and c is a real 
number greater than — 1/fc = —0.25. The remaining entries of X are all set equal to 0. We 
choose three values for c: first, c — 0.2 which implies a correlation of 0.36 between any two 
regressors and a condition number of 3.2 for X' X; second, c — 2 which implies a correlation of 
0.952 and a condition number of 81; and c = —0.2 which implies a correlation of —0.32 and a 
condition number of 25. For either type of design we proceed as follows: For the given parameters 
9 — (3, 1.5,0,0)' and ct = 1, we simulate 10,000 data vectors Y and compute the corresponding 
estimator, i.e., the Lasso and adaptive Lasso as specified above. We set 77^ = n~^/^$~'^(0.975), 
implying that the thresholding estimators delete a given irrelevant variable with probability 0.95. 

For the non-zero outcomes of the estimators, we plot the histogram of n^^'^cr~-^S^~^ (6.i — 6*,) 
which is normalized such that its mass corresponds to the proportion of the non-zero values. The 
zero values are accounted for by plotting "pointmass" with height representing the proportion 
of zero values, i.e., the simulated variable selection probability. For the purpose of comparison 
the graph of the distribution of the corresponding (centered and scaled) thresholding estimator 
(using the same 77^ „ — ?]„) as derived analytically in Sectionp^is then superimposed in red color. 
The results of the simulation study are presented in Figures 1-12 below. 

In comparing the adaptive Lasso with the adaptive soft-thresholding estimator, we find re- 
markable agreement between the respective marginal distributions in all cases where the design 
matrix is not too multicollinear, see Figures 1, 2, and 4. For the cases where the design matrix 
is no longer well-conditioned a difference between the respective marginal distributions emerges 
but seems to be surprisingly moderate, see Figures 3, 5, and 6. 

Turning to the Lasso and its thresholding counterpart, we find a similar situation with a 
somewhat stronger disagreement between the respective marginal distributions. Again in the 
cases where the design matrix is well-conditioned (Figures 7, 8, and 10) the difference is less 
pronounced than in the case of an ill-conditioned design matrix (Figures 9, 11, and 12). 

We have also experimented with other values of n, fc, 9, p, c, and and have found the 
results to be qualitatively the same for these choices. 
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i = 2 





Figure 2: Adaptive Lasso, Design I: p = 0.5 
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Figure 3: Adaptive Lasso, Design I: p = 0.9 



31 




Figure 4: Adaptive Lasso, Design II: c = 0.2 
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Figure 5: Adaptive Lasso, Design II: c = 2 
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i = 2 




Figure 6: Adaptive Lasso, Design II: c = —0.2 
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i = 2 




Figure 7: Lasso, Design I: p = 0.3 
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i = 2 




Figure 8: Lasso, Design I: p = 0.5 
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i = 2 
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i = 2 




Figure 12: Lasso, Design II: c = —0.2 



8 Proofs 

8.1 Proofs for Section |3] 

Proof of Proposition Wc first prove Part (a). Rewrite Pn,e.a (^^i = 0^ as 

* {n'/%n {-0^/<J + - $ {n'/^iZl {~9Ja - e.,„r7.,„)) ■ (25) 

Assume first tfiat „?7j „ — > and fix 0i ^ 0. By a standard subsequence argument we may 
assume without loss of generality that n^^'^^.Yn converges to a constant k which by our maintained 
assumption must satisfy < k < oo. Now — ^i/cnt^j „77j „ both converge to —9i/a, which is 
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non-zero, and consequently both arguments in (25 1 converge to —nOi/a. Since is continuous on 
K, the expression (251 converges to zero. To prove the converse, now assume that (25) converges 
to zero for all 9i ^ Q. By a standard subsequence argument, we may assume without loss of 
generality that ^rj^ „ converges to a constant x satisfying Q < x < oo. Suppose x > Q holds. 
Choose 9i such that < —Qija < x holds. It follows that —9i/a + ^i^^Vi n ^^'^ —&i/<^ — ^i,n'ni,n 
eventually have opposite signs and are bounded away from zero. By our maintained assumption 
the same is then true for the arguments in ( |25[ ) leading to a contradiction. Hence x = Q must 

hold, completing the proof of Part (a). Parts (b) and (c) are obvious since Pn,e.a {^i = 0^ = 
$ {n^^^Vi,n) - ^ (^"^^^^'7i,ri) whenever 6*^ = 0. ■ 

Proof of Proposition [sj Part (a) follows immediately from (|4| and the assumptions. To 
prove Part (b) we use ^ to write 

Pn,ei">.'r,. = O) = $ (nl/2^,„ (1 - 0vn/(^n^,.„'?.,j))-<I> {n'^'v^^n ("1 " ^ ,n / ^ ,n^h J)) ■ 

The first and the second claim then follow immediately. For the third claim, assume first that 
C, = 1. Then 

PnM-Ka^ = O) = $ (^^^^ _ C,0.,„/(^n^., J)) 

The case = — 1 is handled analogously. ■ 

10: We prove Part (b) first. Observe that 



Proof of Proposition 

Pn,e,a (O. = O) 



{s)ds 



By a subsequence argument it suffices to prove the result under the assumption that n — k — 
n — k{n) converges in N U {oo}. If the limit is finite, then n — k{n) is eventually constant and 
the result follows since every i-distribution has unbounded support. If n — fc — > oo then 



< P, 



2IIT, 



11 — k 



$11 



{e^ = O) < $ (ni/2ry,,„) - $ „) + 2 ||r„_fc - $||^ , 



where \\-\\^ denotes the supremum norm. Since \\Tn~k — '^Woo ^Oifrt — /c— ^ooby Polya's 
Theorem, the result follows. Part (c) is proved analogously. 

We next prove Part (a). Observe that the collection of distributions corresponding to 
{/9„j : m £ N} is tight on (0,oo), meaning that for every < (5 < 1 there exist < c*((5) < 
c*{5) < oo such that sup^gj^ Jq''^^^ Pm'^^ < ^ ^^'^ sup^gpj J^,^-, Pm^s < 6. Note that the map 



s i-^ Pn 



) is monotonically nondecreasing. Hence, 



{l-5)Pn^e,a{eM6)mn)=^ 



< 



< Pn. 



< Pn 
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Since Ci,nC*('^)'7i,„ {£.i,n'^* {^)Vi,m respectively) converges to zero if and only if ^i^nVi,n "iocs so, 
Part (a) follows from Proposition [i] applied to the estimators §i(c* (J)??, „)and 9i{c* {5)r]^^^). ■ 

Proof of Theorem (a) Set P„(s) = Pn.e'-^\a„ = for s > 0. By Proposition 

[Sjwe have that P„(s) converges to P{s) for all s > 0, where P[s) — $ [—Vi + sci) — $ {—I'i — sci) 
for s > 0. Since Pn{s) as well as P(s) are continuous functions of s, are monotonically nonde- 
creasing in s, and have the property that their limits for s — > are while the limits for s — > cx) 
are 1, it follows from Polya's Theorem that the convergence is uniform in s. But then using ([5| 
gives 







< sup |P„(s) - P{s)\ / p„_fc(s)ds - sup |P„(s) - P(s 

s>0 Jo s>0 



as n — cx). This completes the proof in case n — k = m eventually; in case n — k-^oo observe that 
($ {—i^i + set) — $ (— i/i — sCi)) p^_f,{s)ds then converges to $ (— i^i + e^) — $ (— j/^ — e^) as 

the distribution corresponding to yO„_j; converges weakly to pointmass at s = 1 and the integrand 

is bounded and continuous. 

(b) Observe that P„.e(n)^cr„ i^ii^'Hi^n) — converges to 1 for s > ICJ and to for s < \Ci\ 

by Proposition [s] applied to the estimator 6i{sri^ „). Now (j5| and dominated convergence deliver 
the result in (bl). 

Next consider (b2): Suppose first that j^J < 1. Choose £ > small enough such that 
ICil + e < 1. Then, recalling that P„ {^iisrii „) = 0^ is monotonically nondecreasing in s, 

eq. ^ gi\ 



Jives 

/>oo 

p 



> p„ 



IC.I+£ 



Now the integral on the r.h.s. converges to 1 since jCJ + e < 1, and the probability on the 
r.h.s. converges to 1 by Proposition [s] applied to the estimator ^i((|Cil + e) ^i.n)- This completes 
the proof for the case jCJ < 1- Next assume that > 1- Choose e > small enough such that 
ICJ — e > 1 holds. Then from ([5| we have 



P., 



< ^«,e("),.„ (^.((IC^I -e) =0) + / Pn-Mds 



since P„(s) is monotonically nondecreasing in s and /g^'' ^ p„_^(s)(is is not larger than 1. Since 
ICJ — e > 1 and rt — fc — > oo the second term on the r.h.s. goes to zero, while the first term goes 
to zero by Proposition s] applied to the estimator ^i((|Cil — 

Next we prove S.&iT^and assume Ci = 1 first. Then using eq. (15]) and performing the substi- 

— 1/2 

tution s — 1 = (2 (n — /c)) f we obtain (recalling that is zero for negative arguments and 
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using the abbreviations rj,„ = n^/^ (ij^ ,^ - 6'j,n/(o'nCi,n)) and r*„ = n^/^ {-'>h,n - ^i,"/(CT„^j^„))) 

^n,fl<"),.„ {O. = O) 

"<!> (r,,„ + n'/\^„ (2 (n - k))"'^' t) - $ (r*„ - ni/2^,,„ (2 (n - fc))^^/^ 
X (2 (n - k))-'^' p„_fe((2 (n - k))-'/^ t + l)dt 
= r k Un + ni/2r;,,„ (2 (n - fc))-i/2 t) - $ (r*„ ~ ni/2r;,,„ (2 (n - k))-"^ t) 

J —OO 

X(i>{t)dt + o{l). 

The indicated term in the above display is o(l) by the Lemma in the Appendix and because the 
expression in brackets inside the integral is bounded by 1. Since ri^„ — >■ and r*^ — >■ — cx), the 
integrand converges to $ (r^) under 3. and to $ (r^ + dit) under 4. The dominated convergence 
theorem then completes the proof. The case = — 1 is treated similarly. 

It remains to prove 5. Again assume Ci = 1 first. Define = {n — k)^^^ „ 

and r"„ — 2^^^n^^^^7j^^ [n — k)^^"^ r*„ and rewrite the above display as 



OO ^ 



„ (2 (n - fc))-^/^ (,i,„ + 0) - $ „ (2 (n - k))-''' «„ - t)) 



X(/)(t)di + o(l). 



Observe that r,- „ 7"^ and rf „ 



— OO. The expression in brackets inside the integral hence 
converges to 1 for t > —r[ and to for t < —r[. By dominated convergence the integral converges 
to (f>{t)dt — ^{r'^). The case = —1 is treated similarly. ■ 



< 



Proof of Proposition |13| Observe that 

Pn,e,a {O^ = O) - Pn,e,a {o ^ = o) 

+ 1$ J - 77,^„)) - $ - stj,J)\} p„_,{.s)ds. (26) 

By a trivial modification of Lemma 13 in Potscher and Schneider (2010) we conclude that for 
every £ > there exists a real number c = c(e) > such that 



Pn-k{s)ds < e 

/|s-l|>(n-fc)-i/2c 

for every 72 > k. Using the fact, that $ is globally Lipschitz with constant (27r)^^/^, this gives 

%e,a {0^ = O) - Pn,e,a = o) 



sup 

eGM'=,0<(T<oo 



< 2 



'|s-l|>(ji-fe)-l/2c 

+2(2.)-V2,i/2^^^^ 

/|s-l|<(ji-*:)-i/2c 

< 2e + 2(27r)-i/2ni/2,7,„(n-fc)-i/2c 
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which proves the result since e can be made arbitrarily small. ■ 



8.2 Proofs for Section |4] 



Proof of Theorem 16; (a) Observe that 



(27) 



holds for any of the estimators. Hence, consistency of 9i under ^rj^ „ ^ and 
follows immediately from Proposition 15 a) since the distributions oi a/a are tight. Conversely, 

suppose Oi is consistent. Then clearly Pn,e,<y {Oi = 0^ — whenever 9i ^ Q must hold, which 

implies £,inVi n ~^ by Proposition 10 a). This then entails consistency of O^s.i by (|27 [) and 
tightness of the distributions of a/a; this in turn implies ^ by Proposition IsFa). 

(b) Since ai^„ — cx), it suffices to prove the second claim in (b). Now for every real M > 
we have 



P, 



'HA 



'LSa 



> aM^ 
> aM, 



+l{a,,n\9,\ >aM)P^^e.,a[ 



'LS,i 



> 



'LS.i 



'LS,i 



< P. 



This gives 



'LS,i 



>aM]+l {a„, \0i\ > aM) F„,e. 

> ctm) + 1 (a,,„ \0,\ > aM) P^^e.a ( 



'H,i 



sup sup sup P„,e,<T [ai.n 

nSN egR'= 0<<T<oo ^ 

< sup sup sup Pn,e.a ({n^^^/^i,n) 
nSN egR'= 0<<T<oo ^ 



> aM 



'LSA 



> aM 



sup sup sup Pn.e,a[ OLS,i < ^^i nVi n) 

neNO<a<oo egEfc:|ei|>o-M/a- ^ ' ' / 



where the first term on the r.h.s. can be made arbitrarily small in view of Proposition 15 'b) by 
choosing M large enough. The second term on the r.h.s. can be written as (cf ([5|) 



sup sup sup 

neNO<cr<oo e£TSL'':\ei\>aM/ai,r, J Q 



P 



< sup sup 



sup 



neNO<cr<ooJo eeR'':\ei\>cTM/ai 



'n,e,<T 



'LS,i 



'LS,i 



For e > choose c*(e/2) as in the proof of Proposition 10 Using continuity of <i> and the fact 
that the probability appearing on the r.h.s. above is monotonically increasing as \9i\ approaches 
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< sup 

new Jo 



crM/ai^n from above, this can be further bounded by 

< e/2 + sup/ '^[sn'^\n-Ma-y/y^,Jp„_kis)ds 

< e/2 + sup$ (ni/^^r^a-i (c* (e/2)e,,„ry,,„a,,„ - M)) < e/2 + $ (c*(e/2) - M) , 

the last inequahty holding for M > c*{e/2) and since n^^'^^Zn'^in — 1 ^'^'^ ^i,n''li,n^i,n < 1- 
Choosing M sufficiently large (depending on e) completes the proof for 6H,i- Next observe that 



< 



a min (^n^ 



/2 



1 < <T 



and similarly ai_„ 9^ i — Oj^ss < o" hold. Since the set of distributions of a /a (i.e., the set of 
distributions corresponding to Pn-k) tight as already noted, this proves (b) then also for ^5^^ 
and OAS,i- 

(c) By a subsequence argument we can reduce the argument to the case where n^^'^r]^ „ — )■ 
Bi G M. and n — k converges in N U {00}. Suppose first that = 00: Observe that then 
^i," = (Ci.n'yi.n)"^ eventually. Choose 9i^n and ct„ such that 9i^n/ (CT„Ci,n'7i,n) = where Ci 
does not depend on n and < jCJ < 1 holds, and set the other coordinates of 0*-"'' to arbitrary 
values (e.g., equal to zero). Observe that there exists a constant S > such that 



lim inf P 

n— >oo 



= 0)>S 



(28) 



holds: If n — fc converges to a finite limit, i.e., is eventually constant, the claim follows from 
Theorem [Tl|bl); if n — fc — >■ 00, then use Theorem [TT|b2). By (|6| we have for e = 6 and a 
suitable M that 



> Pn 
= P„ 



I /a„ > M) P„,e(.),,„ - 0) >(51 (|6,,„0,,„| /a„ > Af) 



for all n sufficiently large. But this is only possible if 6i,„^j „?7j „ < M/ \(^\ < 00 holds eventually, 
implying that 6^ „ = 0(ai^„). Next consider the case where < < 00: Observe that then „ 
is of the same order as n^/^/^^^. Then define and cr„ such that n^/^0i^„/ (cn^j = 

where does not depend on n and < < 00 holds, and set the other coordinates of 6''"-' 
to arbitrary values (e.g., equal to zero). Observe that then (28) also holds, in view of Theorem 
11 al) in case n — A: is eventually constant, and in view of Theorem |ll[ a2) in case n — k ^ 00. 



The rest of the proof is then similar as before. It remains to consider the case = 0: It follows 
from (27), the assumptions on „ and 77, „, from — 0, and from the observation that 0Ls,i is 

Af(6'j, cr^^^„/n)-distributed, that n^^^C^n'^^^ i^i ~ (^ij converges in distribution to a standard 

normal distribution for each fixed 9i and a. Hence, stochastic boundedness of cr^^bi^n 

for each 9i (and a fortiori ml) necessarily implies that 6i.„ = 0{n^^^^~^) — 0(ai.„). 
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(d) The proof for 9i is similar and in fact simpler: note that now 9i — 9LS,i < '^^i,n''li,n 
holds and that in the proof of (b) the integration over s can simply be replaced by evaluation at 
s = 1. For (c) one uses Proposition [s] instead of Theorem 



11 



8.3 Proofs for Section I 



Proofs of Propositions 19, 20, and 21 Observe that 
and that QLS,i/ [<yS,i^n) is N (6'i/(cr^j „), Furthermore, we have 

Identifying LS,il n) ^"^^ ^i/i'^dn) with y and 9 in Potschcr and Leeb (2009) and making 
use of eq. (4) in that reference immediately gives the result for dHlf a- The result for H\j g ^ 
then follows from elementary calculations. 

The result for dHg ^ g ^ follows similarly by making use of eq. (5) instead of eq. (4) in Potscher 
and Leeb (2009). The result for H^^ g ^ then follows from elementary calculations. 

The results for dH\g ^ g ^ and H\g ^ g ^ follow similarly by making use of eqs. (9)-(ll) in 
Potscher and Schneider (2009). ■ 



Proofs of Propositions 23, 24, and 25: We have 



Pn,e,a cr ai^n{9H,i - 9 i) < X \ a = s(T \ p^^_k{s)ds 



H 



.eA^)Pn-kis)ds, 



where we have used independence of a and ^ls.i allowing us to replace a by sa in the relevant 
formulae, cf. Leeb and Potscher (2003, p. 110). Substituting (fzl), with ry, „ replaced by srj^ „, into 



the above equation gives (12 1. Representing H\j nSai^ ^ integral of dH\j 
given in ([s]) and applying Fubini's theorem then gives (13). 
Similarly, we have 



/"OO 

Jo 



eA^)Pn-k{s)ds. 



Substituting (^9L with -q^ „ replaced by srj^ „, into the above equation and noting that $(a - 
bs)p^{s)ds — T^-a{b) gives (14). Elementary calculations then yield (15). 
Finally, we have 



Substituting (11), with 77^ „ replaced by S77j„, into the above equation gives (16). Elementary 
calculations then yield (17). ■ 



46 



8.4 Proofs for Section |6] 



Proof of Proposition 27 : The proof of (a) is completely analogous to the proof of Theorem 
4 in Potscher and Leeb (2009), whereas the proof of (b) is analogous to the proof of Theorem 17 
in the same reference. I 



Proof of Proposition 
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The proof of (a) is completely analogous to the proof of Theorem 
5 in Potscher and Leeb (2009), whereas the proof of (b) is analogous to the proof of Theorem 18 
in the same reference. I 



Proof of Proposition 29 : The proof of (a) is completely analogous to the proof of Theorem 
4 in Potscher and Schneider (2009), whereas the proof of (b) is analogous to the proof of Theorem 
6 in the same reference. I 



Proof of Theorem I30t Observe that the total variation distance between two cdfs is 
bounded by the sum of the total variation distances between the corresponding discrete and 
continuous parts. Furthermore, recall that the total variation distance between the absolutely 
continuous parts is bounded from above by the Li-distance of the corresponding densities. Hence, 
from Q and ( 13 1 we obtain 



\H' 



H 



where 

and 

B = 



A- 



P.. 



iif II 



<A + B 







oo Jo 

1 (|a 

oo poo 



dx 



1 

OO poo 



(ni/277,_„(s A 1) < |u + ni/2^,/«,„)| < ni/'r,,_„(s V 1)) cj, (u) dup^^^{s)ds 

' { [$ (-0,/«_„) + r^^ Js V 1))) - ^ (-0,/«,„) + r?,,„(s A 1)))" 

$ (-e,/«,J - r;,,„(s A 1))) - $ (n''^ J - ??.,„(s V 1)))] } p„_fc(s)d. 

where we have made use of Fubini's theorem and performed an obvious substitution. By a trivial 
modification of Lemma 13 in Potscher and Schneider (2010) we conclude that for every e > 
there exists a real number c — c(e) > such that 



L 



|s-l|>(ri-fc)"^/^c 



(29) 



47 



for every n — k > 0. Using the fact, that $ is globally Lipschitz with constant {2tt) this 
gives 



sup B < 2 

0<cr<co 



p„_fe(s)ds 



|s-l|>(n-fc)-i/2c 

'|s-l|<(n-fc)-i/2c 



|(sVl)-(sAl)|p„_fc(s)rfs 



The r.h.s. now converges to 2e because v}^'^'q^ ^{n-k)-^/'^ -> 0. Since e > was arbitrary, this 
shows that supggRk o<ct<oo ^ converges to zero. Note also that supg^jft o<(t<oo ^ has already been 
shown to converge to zero in Proposition [13] This completes the proof for the hard-thresholding 
estimator. 

With the same argument as above we obtain 



\^S,n,e,cr ~ Hs,n,e,a\\TV - ^ + 



where 
and 



A 



P, 



/ f / \ 

-(/) (n^^^x/{ai^n^^,n) + '>T-^^^s'ni,n) Pn-k{s)dsl (a~^a; + 6*^/(7 > O) dx 

/OO PCX 



where we have used (10 1 and (15). Now, 



where 



B< / (Si(s) + i32(.s))p„_fe(s)ds 



(t>{u + n^^^Vt,n) -4'(u + n^/^S77,, 
(l)(u~ n^^'^-n^^ -(t){u- n^^^sr]^,^'j 



du, 
du. 



and where we have used Fubini's theorem and an obvious substitution. It is elementary to verify 
that 

B,{s) = B2{s) = 2 ^n'^\,„is - l)/2) - ^-n'/\^„is - l)/2) , 



and that -Bi(s) < 2 holds. Consequently, using (29) we obtain 



B < 4 



'|s-l|>(n-fc)-i/2c 

< 4£ + 4(27r)-i/V/2^,,„ 



'|s-l|<(n-fe)-i/2c 

< 4e + 4(27r)-i/2„i/2^^^^(„ _ ky^^^c, 



|s-l|<(ri-fc)-i/2c 

\s- l\p^_kis)ds 



{Bi{s) + B2{s)) p,,_,{s)ds 
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where we have again used the fact that $ is globaUy Lipschitz with constant (27r)^^/^. Since 
n^/^ryj „(n — /c)"^/^ — and e > was arbitrary, the proof for soft-thresholding is complete, 
because sup^ggt o^£,.^oc A goes to zero by Proposition 13 



Finally, from (111 and (16) we obtain 



H' 



AS,n,e,tT 



H 



I 

AS,n,e.<T 



< 



oo 

sup 

xeR 

oo 



(z^^^ (x 



- $ 



sup 

kSR 
oo 

Ci{s)Pn_k{s)ds+ I C2{s)p„_k{s)ds 









Observe that on the one hand Ci(s) and 6*2(5) are bounded by 1, and that on the other hand, 
using the Lipschitz-property of $ and the mean-value theorem, 

|Ci(s)| < (27r)-i/2sup zi5^,(x,77_)-zi5^,(x,s,7,,J 



(27r) 



-1/2 



sup 



< (27r) 



-1/2 1/2 2 



1| sup 



a:6l 



-1/2 



where s is a mean- value between s and 1 which may depend on x. The supremum over x on the 
r.h.s. is now clearly assumed for x — — ai.n^i/cr, resulting in the bound 

|Ci(s)|<(27r)-VV/Vnk-l|- 



The same bound is obtained for C2 in exactly the same way. Consequently, using ( 29 1 we obtain 

Pn-k{s)ds 



sup \\HAS..n,0,a 

eeK'=,o<(T<oo 



H 



m I 

AS,n,9,(j 



< 2 



|s-l|>(Tl-fc)-l/2c 



+2(27r)-V2„i/2^^^^^ 



< 2 



|s-l|<(n-fc)-i/2c 

e+(2^)-i/2ni/2^,„(n-fc)-i/2c 



^\Pn-k{s)ds 



Since n^/'^rj^ ^^{n — k) ^1"^ — and e > was arbitrary, the proof is complete. 



Proof of Theorem 



33 



i5i 



(a) The atomic part of dH^^ 



as given in ( 13 ) clearly 



converges weakly to the atomic part of (21) in view of Theorem ll|al) and theiact that 
oii^nOi,n/ = "'^^^^'i,n/(o'„^j „) — > Vi by assumption; also note that the atomic part converges to 
the zero measure in case = 00 or = as then the total mass of the atomic part converges 
to zero. We turn to the absolutely continuous part next. For later use we note that what has 
been established so far also implies that the total mass of the absolutely continuous part con- 
verges to the total mass of the absolutely continuous part of the limit, since it is easy to see that 
the limiting distribution given in the theorem has total mass 1. The density of the absolutely 
continuous part of ( 13 ) takes the form 

(t>{x) I l(^x + n^''^e,^nl (cr„C, „) > s?!^/^??, p„„j. {s)ds. 
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Observe that for given x G M, the indicator function in the above display converges to 
1 (|a; + i^il > sci) for Lebesgue almost all s. [If = 0, this is necessarily true only for a; € M with 
X 7^ —Vi-] Since n — k = m eventually, we get from the dominated convergence theorem that the 
above display converges to (f> (x) 1 (|a; + i^^j > se^) p^{s)ds for every x € K (for every a; e M 
with X ^ in case =0), which is the density of the absolutely continuous part in (21|. 
Since the total mass of the absolutely continuous part is preserved in the limit as shown above, 
the proof is completed by Scheffe's Lemma. 



(b) Follows immediately from Proposition 27 and Theorem 30 



Proof of Theorem 
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(a) The atomic part of dH]^ 



'Sne(")cr given in (15) converges 
weakly to the atomic part of (22 1 in view of Theorem |ll[ al) and the fact that Q!i,n^i,n/cn — 
^^^''^^'^,n/(o■„^j „) — ?► Vi by assumption; also note that the atomic part converges to the zero 
measure in case = (X) or = as then the total mass of the atomic part converges to 
zero. We turn to the absolutely continuous part next. For later use we note that what has been 
established so far also implies that the total mass of the absolutely continuous part converges 
to the total mass of the absolutely continuous part of the limit, since it is easy to see that 
the limiting distribution given in the theorem has total mass 1. The density of the absolutely 



continuous part of ( 15 ) takes the form 



(x + „) p„_fe(s)dsl (x + n^/'0.,„/(a„^,,„) > o) 

+ / (a^ - sn^^\^n) Pn-k{s)dsl (x + 7i'/'0»,„/(fT„^,_„) < O) 



Observe that for given x £ M, the functions (f> [x ± sn^^^r]^ „) converge to (/) (x ± set), respectively, 
for all s. Since n— k — m eventually, we then get from the dominated convergence theorem that 
the above display converges to 

> POO 

(x + sci) p„^{s)dsl {x + i/i > 0) + (t>{x — sei) Pj^{s)ds\ (x + Vi < 0) 

Jo 

for every x^ —I'i', the last display is precisely the density of the absolutely continuous part in 



(22). Since the total mass of the absolutely continuous part is preserved in the limit as shown 



above, the proof is completed by Scheffe's Lemma. 



(b) Follows immediately from Proposition 28 and Theorem 30 



H 



Proof of Theorem 35 ; (a) Observe that 

'Ts,n,9i-\.S^) = I *(^i%„,^,Ja:,5r?,,j)p„_fe(s)dsl(x + ni/20,,„/Ke^_J>o)(3O) 



where s?7,^„) and z^^l(^) ^^^{x, stj,^J reduce to 

0.5(x - ni/2^,^„/(a„e.,J) ± ^ (0.5(x + ^1/2^^ „/(a„e,,J))' + s^nryf ^. 
Clearly, $ ,^^(a;, srii j'j as weh as $ <,^(a^, sr]^ „)j converge for every s > to 

$ (^0.5(x - i^i) - •\/(0.5(x + i/,))' + s2e2^ 
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and 



respectively, if < oo, and the dominated convergence theorem shows that the weights of the 
indicator functions in (30) converge to the corresponding weights in (23). Since «^^^0i,n/(CT„^j „) 



converges to Vi by assumption, it follows that for every x 7^ —Vi we have convergence of 
^^AS n 9<") a '^'^^ givcn in (23). This proves part (a) in case < 00. In case I'i = 00, 

(2) — 

we have that ^ {x, srj^ converges to x by an application of Proposition 15 in Potscher 

and Schneider (2009). Consequently, the hmit of $ (^^n\^r^) ^ (a;,s??i,n)) is now Again 
applying the dominated convergence theorem and observing that for each cc € K we have that 
1 (x + n^^^6i^n/{an^i,n) < O) is eventually zero, shows that H^^^ ^ 
The case Vi = —00 is proved analogously. 



(x) converges to $ (x) 



Proof of Theorem 

i.n ) 



(b) Follows immediately from Proposition 29 and Theorem 30 

Observe that 

-6'i,n/(CTri<^i,„?7j,„)l {Oh,i = Oj 



where Z„ is standard normally distributed. The expressions in front of the indicator functions 
now converge to —d and 0, respectively, in probability as n — > oo. Inspection of the cdf of 



'H,i 



then shows that this cdf converges weakly to 



lim P 



'H 



^=0 



1 - lim P 



'H.. 



= 0]]So 



if ICJ < oo. Part (b) of Theorem 11 completes the proof of both parts of the theorem in case 
ICJ < oo. If ICil = oo the same theorem shows that the weak limit is now Sq. I 



Proof of Theorem 
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(a) The atomic part of dW^^ ^ as given in (|15|) converges 



weakly to the atomic part given in ([24]) by Theorem [njbl). The density of tile absolutely 

can be written as 



continuous part of dH^^ 

(t> {n^^^Vt,n + S)) Prnis)dsl {x + 6'i,„/(cr„Ci,„'7j,„) > O) 



'h,n 



recalling the convention that p„i{s) = for s < 0. Note that with this convention p„ is 
then a bounded continuous function on the real line. Since n^/'^Tyj „0 (ri^/^?7j „ (x + •)) and 
rt^/^77j „0 (n^/^77j „ (x — •)) clearly converge weakly to (5_^ and S^, respectively, the density of 
the absolutely continuous part of dW^^ g(„) ^ is seen to converge to x)l (x + > 0) + 

p,„(x)l (x + Ci < 0) for every x 7^ — C^. An application of Scheffe's Lemma then completes the 



proof, noting that the total mass of the absolutely continuous part of dH 



converges to 



the total mass of the absolutely continuous part of ( 24 1 as the same is true for the atomic part 
in view of Theorem 1 1 1 | [b 1 ) (and since the distributions involved all have total mass 1). 
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(b) Rewrite cr„ ^aj,„(6's^i - 6'i,„) as 

where Wn is a sequence of N{0, n~^?7~^)-distributed random variables. Observe that 6i^n/ {<^n£,i nVi n) 
converges to Ci and that Wn converges to zero in -P„^e(>.)^ 



1 by Theorem 



11 



probability. Now, if jCJ < 1, then 
b2), and hence cr'^ai^ni^s.i ~ 0i,n) converges to — 



in P„ g(„) ^ -probability. This proves the result in case jCJ < 1. In case jCJ > 1 we have that 



{^S.^ ^ O) 



1 



and 



-P«.e("),<7„ (sign(VF„ + 6'^n/(o-„C^„^»,„)) = sign(Cj)) ^ 1. (31) 
Clearly, also (T/f7„ converges to 1 in ^.^ -probability since n — k oo. Consequently, 

<^n^C(i,ni(^s,i—()i,n) couvcrgcs to — sign(^j) in P„ ^^-probability, which proves the case jCJ > 1- 
Finally, if jCJ = 1, then ( [3l| ) continues to hold and we can write 

<Jn^a^,nies,^ ^ ^?^,n) = ("C. + o(l)) 1 (^S,. = o) - (Op(l) + (1 + Op(l)) sign(C,)) 1 (^S,» ^ o) 

= -sign(C,) + Op(l), 

where Op(l) refers to a term that converges to zero in P„ g(„) ^.^ -probability. This then completes 
the proof of part (b). ■ 



Proof of Theorem 

.(2) 
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(a) Assume first that < (^.^ < oo holds. Note that -z^"'^g(„) ^ (a;, srj.^ „) 



and {x, srj.^ „) now reduce to 



First, for x > —(^ we see that H^^g ^ g(„) ^ (x) eventually reduces to 



$ z 



(2) 



Furthermore, for a; > we see that z 



(2) 



a; < we have that z 



(2) 



{x,sri^^) — > oo for all s > whereas for — Ci < 



{X, ST]^ 



oo for s > ^xC,i and 2; 



(2) 



(2;,sr?L„) 



for s < yJ—xC^.j^. As a consequence, we obtain from the dominated convergence theorem that 
^ASnff^"'! a (^-^ converges to 1 for a; > and to Pm(s)ds for —(^ < 2: < 0. Second, for 

X < —(^ note that iJ^^ ^ ^ (x) eventually reduces to 



and that z^^g(n) ^ [x, srj^ „) — > —00 for all s > in this case. This shows that for x < — Ci we 
have that -ff^* g(„) ^ (a;) converges to 0. But this proves the result for the case < Ci < 00. In 
case Cj = 00 the same reasoning shows that now H^^^ ^ g(„) ^ (x) eventually reduces to 
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(2) 

for all X, and that now for x > we have ^ (x, 577, „) — > 00 for all s > whereas for a; < 

we have that z^^^g(„) ^ [x^sr]^^^) —00 for all s > 0. This shows that H^^^ ^ g(„) ^ converges 
weakly to (5o in case = 00. The proof for the case < is completely analogous, 
(b) Rewrite ct,7^q;i,„(6'^s^i - Oi^n) as 



= -0^n/(a„C.,„^.,„)l (^AS,. = 0) + (Wa - (-rV^rO e,,„r,,,„/^LS,,) 1 (^AS,. ^ o) 
= -6'»,n/(0-«C»,„»7j,n)l (^AS,i = 0^ 

+ {Wn - {cT^/al) {Wn + e,„/(^«C.,„r;,,„))"') 1 7^ O) 

where Wn is a sequence of iV(0, n~^?7^^)-distributed random variables. Note that Oi^n/ (o'nCi,n'7i,n) 

converges to d by assumption. Now, if ICJ < 1, then ^'„.e(»)^cr„ {^As,i = — J" 1 by Theorem 

11 |b2), hence cF~^ai^n(0 As,i — Oi^n) converges to — in P^gi-n) -probability, establishing the 
result in this case. Furthermore, for 1 < ICil !i oo rewrite the above display as 

(-C, + 0(1)) 1 {Oas. = o) + (op(l) - (1 + Op(l)) (C, + Op(l))-i) 1 (Oas. + O) 

= (-C. + 0(1)) 1 {dAs. - o) + (-C' + Op(l)) 1 {eAs., ^ o) , 

with the convention that C,^^ = in case jCJ = oo. If \C,^\ > 1 (including the case = oo) 
then Pn g{n) {Oasa ^ 1 by Theorem 11 'b2), and hence the last display shows that 



a~^ai^n(() AS,i — Si,n) couvcrgcs to — Ci~^ in P„ g(r,) -probability, establishing the result in this 
case. Finally, if = 1 holds, then the last line in the above display reduces to —(^ + Op(l), 
completing the proof of part (b). ■ 



Proof of Proposition 39 ; (a) By a subsequence argument we may assume that n — k 

b) we obtain that Pn,e,cr {^H,i = 0^ converges to 



converges in NU {cxd}. Applying Theorem 
1 in case 9i — 0, and to in case 9i ^ 0. O 



11 



Dserve that 

holds on the event 9^,1 = 0, while 

{Oh,. - 6,) = [eLs,, - 9,) =: Z„ 

holds on the event 9H.i 0. The result then follows in view of the fact that Z„ is standard 
normally distributed. The proof for 9H,i is similar using Proposition [sj^b) instead of Theorem 
[TTJb) (it is in fact simpler as the subsequence argument is not needed). 

(b) Again we may assume that n — k converges in N U {oo}. By the same reference as in the 

proof of (a) we obtain that Pn,e.a (^asa = converges to 1 in case 9i = 0, and to in case 
9i ^ 0. Now 
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holds on the event 9AS,i = and the claim for 9i = follows immediately. On the event OAs,i 7^ 
we have from the definition of the estimator 

Now, if 9i 7^ 0, then the event Oasa 7^ has probability approaching 1 as shown above. Hence, 
we have on events that have probability tending to 1 

= Z„-Op(l), 

since nyyf „ — )■ oo and Crn'^~^^^^i~n — ^ oo by the assumption and since 9i ^ 0; also note that tr/tr 
is stochastically bounded since the collection of distributions corresponding to with m € N is 
tight on (0, oo) as was noted earlier. The proof for 9 As,i is again similar (and simpler) by using 
Proposition [8|b) instead of Theorem [TT|b) . ■ 
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A Appendix 

Recall that Pmi^) = for x < 0. 

Lemma 46 {2m)^^^^ p^{{2m)^^^'^t + 1) converges to 4>{t) in the Li-sense as m ^ oo. 

Proof. Observe that {2m)-'^/'^ p^{{2m)-^/'^t+l) is the density of Um = {2mf/'^ (y^xlJm. - l) 

where denotes a chi-square distributed random variable with m degrees of freedom. By the 
central limit theorem and the delta-method Um converges in distribution to a standard normal 
random variable. With 

gm{x) = 2-™/2 (r(m/2))-^ a;(W2)-i exp(-a;/2) for a; > 

being the density of Xm have for a; > 

Pm{x) = 2mxgrn{mx^) = 2^-™/" (r(m/2))-' m^/^ (rnx^) ("»/2)-i/2 (_to^2/2) 
= {Smy/^r{{m + 1) /2) (r(m/2))-i g^+i (mx^) • 

and we have Pmi^-) = for a; < 0. Since the cdf associated with gm+i is unimodal, this shows 
that the same is true for the cdf associated with But then convergence in distribution of Um 
implies convergence of mT^^"^ p^{m~^/'^t + 1) to (t>{t) in the Li-sense by a result of Ibragimov 
(1956), Scheffe's Lemma, and a standard subsequence argument. ■ 
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